# Goals — Easy-Multiplayer v2 Redesign

This document is the **definition-of-done index** for the v2 redesign. Each goal is independently verifiable. For ordering and task tracking, see `TASKS.md`. For the design vision, see `easy_multiplayer_redesign_concretized_architecture.md`.

## How to Read This Document

Each goal has four fields:

- **Why** — the motivation; what's blocked without this
- **Deliverable** — the concrete artifact(s) produced
- **Success** — observable outcomes that mean "done"
- **Verify** — specific, mechanical steps anyone (human or AI) can perform to check

Goals are grouped by phase:

- **Phase A** — Foundation: pluggable transport, deterministic test harness, scenario catalog. No protocol changes.
- **Phase B** — Protocol redesign: implemented TDD-style against the Phase A harness, one goal at a time.
- **Phase C** — Polish, scale validation, and the bus subsystem.
- **Meta** — Cross-cutting goals (documentation, scenario coverage, API stability) that apply across phases.

---

# Phase A — Foundation

## Goal A1 — Abstract Transport Interface

**Why:** Without a clean Layer-1 boundary we cannot build a deterministic test harness, run scenarios against multiple transports, or eventually plug in a server-authoritative backend. This is the foundation everything else builds on.

**Deliverable:**
- `TRANSPORT_SPEC.md` — formal contract: methods, semantics, ordering guarantees, error model, message-size and backpressure expectations
- A JS interface barrel or documented duck-type that implementations follow

**Success:**
- The interface is fully written down before any implementation begins
- All required capabilities are specified: send (unicast), broadcast, receive callback, peer-join/leave events, transport-level heartbeat, peer list, opaque local ID

**Verify:**
- Open `TRANSPORT_SPEC.md`; confirm every method has a signature, a semantic description, and at least one usage example
- Pose this spot-check question: *"If a peer goes silent, how does Layer 1 distinguish 'still alive' from 'gone'?"* The spec must answer this directly without referring to Layer 2 logic.

---

## Goal A2 — Deterministic In-Process Transport

**Why:** TDD requires a controllable, fast, deterministic substrate. Wall-clock and real network make scenarios flaky and slow.

**Deliverable:**
- `test-harness/MemoryTransport.js` — implements `Transport` for in-process N peers
- `test-harness/VirtualClock.js` — deterministic time source
- `test-harness/NetworkSim.js` — per-link latency, jitter, drop rate, duplication, reordering, partition control

**Success:**
- A scenario with 10 peers and 60 simulated seconds runs in under 1 wall-clock second
- Re-running the same scenario with the same seed produces byte-identical message logs
- Any peer pair can be partitioned and re-joined at arbitrary virtual times

**Verify:**
- Run a "send 100 packets, advance clock 10s, observe deliveries" smoke test twice with the same seed; logs match exactly
- Run a partition test: cut link A↔B at virtual t=5s, send messages from A, advance clock to t=10s, confirm B received nothing; reconnect, advance to t=15s, confirm B receives queued messages (or not — per spec)

---

## Goal A3 — Multi-Node Test Harness

**Why:** Scenarios in `TEST_SCENARIOS.md` need a uniform way to spin up N peers, drive them through actions, and assert convergence properties.

**Deliverable:**
- `test-harness/PeerHarness.js` — spawn an EasyMultiplayer instance wired to `MemoryTransport`
- `test-harness/SceneRunner.js` — execute a scenario object `{peers, actions[], assertions[]}`
- At minimum the following assertion primitives:
  - `expectConverged(peers, atVirtualTime)` — all peers agree on state hash at the given virtual time
  - `expectTick(peerId, tickNumber)`
  - `expectQueryResult(peerId, tick, queryId, value)`
  - `expectMessageVolume(peerId, lessThan)`

**Success:**
- A trivial scenario ("3 peers, no inputs, advance 10s, all converged") passes end-to-end
- Total harness code stays under ~500 lines (a heuristic for clean boundaries)

**Verify:**
- Run the trivial convergence scenario
- Read `PeerHarness.js`; confirm no direct imports of `WorldNetworkCommunicator` or `trystero` — only the `Transport` interface

---

## Goal A4 — English Test Scenario Catalog (initial)

**Why:** Many novel v2 behaviors have edge cases nobody has fully thought through. Writing scenarios in English first forces the edge cases out before any code commits to a behavior.

**Deliverable:**
- `TEST_SCENARIOS.md` populated with the first ~40 scenarios spanning categories 1–5 (determinism, basic rollback, sparse inputs, silence, attendance)
- Each scenario uses the template documented at the top of the file
- Each scenario has an explicit "Open questions" field, blank if none

**Success:**
- Reading the 40 scenarios surfaces at least 3 design questions not previously noted in `KNOWN_ISSUES.md` (proves the exercise was productive)
- A technically competent person could pick any scenario and translate it directly into a JS test

**Verify:**
- Count: ≥40 scenarios with IDs in `S-001-XX` through `S-005-XX`
- Pick one random scenario from each category; confirm preconditions, actions, expectations are unambiguous

---

## Goal A5 — Refactor Current Networking Behind Transport Interface

**Why:** Lock in the boundary before any protocol change so we don't refactor the same code twice.

**Deliverable:**
- `transports/TrysteroTransport.js` — wraps current `WorldNetworkCommunicator` behavior
- `WorldNetworkCommunicator.js` removed or reduced to a thin re-export for back-compat
- All `trystero` imports outside `transports/` removed

**Success:**
- Existing browser test pages (`test-suite/`) still function
- All current Vitest tests pass (once A6 is done)

**Verify:**
- `grep -r "trystero" --exclude-dir=transports --exclude-dir=node_modules` returns no hits
- Run existing test suite — same pass/fail as before the refactor

---

## Goal A6 — Test Environment Repair

**Why:** Current Vitest setup errors out under Node 12 (`node:fs/promises` missing). Phase B is TDD-dependent.

**Deliverable:**
- Vitest runs cleanly. Either a Node upgrade, a Vitest pin, or a documented invocation.

**Success:**
- `npm test` exits 0 (or with real, meaningful test failures — not a module-resolution crash)

**Verify:**
- Run `npm test`; observe a Vitest summary, not a `node:fs/promises` error

---

# Phase B — Protocol Redesign

Each Phase B goal is "make these scenario IDs in the catalog pass". Goals are roughly ordered by dependency.

## Goal B1 — Sparse Change-Only Input Protocol

**Why:** Massive bandwidth reduction; foundation for passive participation and silence semantics. Aligns with the design doc's central scalability claim.

**Deliverable:**
- New input message format: `{tick, intent}` shipped only on changes
- Decoder that reconstructs the continuous input stream from sparse events
- Migration of existing per-tick input code paths

**Success:**
- A participant holding one input for 100 ticks sends ~1 input packet (plus periodic attendance)
- Reconstructed stream is identical to a tick-by-tick stream from the receiver's perspective

**Verify:**
- Scenarios in category "Sparse input correctness" pass
- Message-volume assertion: 100-tick held input → fewer than 5 input packets

---

## Goal B2 — Transport-Level Heartbeat + Liveness Separation

**Why:** With silence-as-unchanged, liveness must come from a separate channel — otherwise we conflate "quiet" with "gone".

**Deliverable:**
- Attendance message + cadence at Transport layer
- `onPeerJoined` / `onPeerLeft` semantics formalized
- Simulation layer no longer infers liveness from input flow

**Success:**
- A silent-but-alive peer remains in the peer list indefinitely
- A truly-gone peer is reported `onPeerLeft` within ≤2× attendance interval

**Verify:**
- "Silent peer" and "true disconnect" scenarios pass
- Attendance interval is tunable; default documented in `PROTOCOL_SPEC.md`

---

## Goal B3 — Context-Aware Intent Construction

**Why:** Prevents rollback reinterpretation bugs (A button = jump in gameplay, = confirm in dialog). Enables emergent passive participation via `null` returns.

**Deliverable:**
- Public API: `getLocalInputs(localGameState)` returning intent object or `null`
- `null` formalized as "passive participant this tick"
- Migration path from per-field `defineInput` (back-compat shim or breaking change — decide during implementation)

**Success:**
- A participant returning `null` ships no input packets and creates no rollback pressure
- The "A=jump vs A=confirm under rollback" scenario produces the locally-bound meaning

**Verify:**
- Passive-participant scenario: confirm zero input messages over 10 simulated seconds
- Context-bound-intent scenario passes

---

## Goal B4 — Predicate Context Freezing

**Why:** Closure-over-live-state silently produces nondeterministic re-evaluation — a class of bug we want to make impossible.

**Deliverable:**
- `query(playerId, ctx, (input, ctx) => predicate)` signature
- **Debug mode**: ctx is duplicated (efficient deep-clone, not JSON round-trip) on query; re-evaluation compares; mutations warn/error with predicate location
- **Production mode**: ctx trusted as-is, zero overhead
- Choice of deep-clone implementation documented (open question in `KNOWN_ISSUES.md` #5)

**Success:**
- A predicate that incorrectly references mutable game state produces a loud debug-mode warning naming the predicate location
- Production-mode benchmarks show no measurable overhead vs raw closure

**Verify:**
- "Predicate context freezing" scenarios pass
- Write a deliberately-mutating predicate; confirm debug mode catches it
- Toggle debug → production; confirm warning disappears

---

## Goal B5 — Hash Window Broadcasts + Uncertainty-Aware Desync

**Why:** Eager per-tick hash comparison creates false-positive desyncs under sparse input synchronization. This is the conceptually hardest piece of v2.

**Deliverable:**
- Periodic hash window broadcast: `{oldestTick, interval, stateHashes[], usedInputs[]}`
- Desync decision logic: mismatch + no unresolved relevant uncertainty → real desync; mismatch + uncertainty → wait

**Success:**
- Two peers with different hashes but pending acceptance-window inputs do NOT trigger recovery
- Two peers with truly divergent state (no uncertainty) trigger recovery within bounded time

**Verify:**
- Both "false-positive desync avoidance" and "true-positive desync detection" scenarios pass

---

## Goal B6 — Tunable Acceptance + Grace Windows

**Why:** Different games need different latency tolerances; hard-coded numbers are wrong for everyone.

**Deliverable:**
- Configurable `acceptanceWindowMs`, `graceWindowMs`, `snapshotIntervalTicks`, `attendanceIntervalMs`
- Documented defaults with rationale recorded in `DECISIONS.md` once chosen

**Success:**
- Inputs in acceptance window are accepted directly
- Relayed already-accepted inputs in grace window are accepted; raw late inputs in grace window are not accepted directly
- Beyond grace window: rejected; may trigger recovery

**Verify:**
- "Window edges" scenarios pass at the configured numbers, and at edge values (±1ms either side of each boundary)

---

## Goal B7 — `queryDisconnected` + Disconnect-as-Simulation-Event

**Why:** Disconnects must participate in rollback determinism the same way inputs do. Agreement on the disconnect tick is solved explicitly, decentrally, and SPARSELY (decision #30 — attendance are never proactively forwarded), not by a heavyweight consensus protocol (this supersedes the retired #17).

**Deliverable:**
- `queryDisconnected(playerId)` API
- Local canonical disconnect tick = `lastAttendanceTick + timeoutTicks` — a deterministic grow-only-max merge (decision #29; pure core in `DisconnectTracker.js`) — DONE
- Sparse cross-network convergence (decision #30): beat FORWARDING (grow-only-max gossip) + B5-desync / B8-recovery fallback carrying last-attendance-tick (fallback done in B8 via `mergeLastAttendanceTicks`). **SUPERSEDED 2026-06-05:** the original pull-on-suspicion probe (Goal B7.1, `DisconnectProbe.js` + `ProbeNode`) is DELETED — code + tests removed, engine unwired; a 2-trip probe can't rescue what 1-trip reliable gossip can't, and a single dropped one-shot correction caused a false disconnect. See `DESIGN_PARTICIPATION.md` §6.1/§6.2 and DECISIONS #30.

**Success:**
- All peers eventually agree on per-participant canonical disconnect tick (eventual, within a connected component)
- A `queryDisconnected`-conditional code path rolls back when the canonical tick shifts

**Verify:**
- "Disconnect agreement" scenarios pass (local determinism done in B7; sparse-convergence fast path done in B7.1; long-partition fallback in B8)
- "Disconnect-conditional rollback" scenario passes

---

## Goal B8 — Authority + Severe-Desync Recovery

**Why:** When peers diverge beyond the grace window, the system must converge — without a permanent master peer.

**Deliverable:**
- Lightweight authority comparison: older-sim wins; lower-ID breaks ties
- State challenge / full-state transfer flow (also the fallback that reconciles disconnect-tick disagreement when the #30 probe misses its window — the transfer carries per-participant last-attendance-tick)
- Lagging-peer simulation-age reset when "very behind"

**Success:**
- A peer that fell several seconds behind catches up without dominating authority on rejoin
- Two divergent groups converge to one canonical history after recovery completes

**Verify:**
- "Severe desync recovery" and "lagging peer wake" scenarios pass

**Status (core landed 2026-05-29):** pure `Recovery.js` — `compareAuthority`/`isAuthoritative` (older-sim wins, lower-id tiebreak, total order), `resolveDesync` (serve vs adopt), `shouldResetSimulationAge`/`resetSimulationAge` (lagging-peer wake), `makeStateTransfer`/`validateStateTransfer` (carries `lastAttendanceTicks`) + `mergeLastAttendanceTicks` grow-only-max (the #30 slow-path fallback). Proven via `tests/recovery.test.js` (32 unit) + `tests/recovery-harness.test.js` (4) + `test-harness/selftest-b8.mjs` (4, Node 12 too): two partitioned groups converge to the older history after heal (age beats id); equal-age tie → lower id; lagging stale peer resets/yields/adopts, with a disabled-reset CONTROL proving the reset is load-bearing; #30 attendance-tick merge end-to-end. **Remaining for full done:** wiring authority/state-challenge into the v1 engine (which has none today) rides the B9 finalization rework. The #30 fast-path probe itself is the separate **B7.1**.

---

## Goal B9 — Tick Finalization + Memory Bounding

**Why:** Long-running sessions must not leak.

**Deliverable:**
- Ticks past the grace window become immutable
- Snapshots, query logs, sparse input entries past finalization are collected

**Success:**
- A 30-minute simulated session shows bounded memory growth (within a constant factor of acceptance × participant count)

**Verify:**
- Run a long-session scenario; record heap snapshots at 5-minute marks; growth must asymptote

**Status (core landed 2026-05-29):** pure `Finalization.js` — `TickFinalizer` tracks the finalization horizon as GROW-ONLY-MAX (`maxCurrentTick - graceWindowTicks`, so a recovery backward tick-jump can never un-finalize collected ticks); ticks strictly older than the horizon are immutable. Two payload-agnostic tick-only GC policies: `collectAnchored` (carry-forward — snapshots, last-input-per-participant: keep the latest entry ≤ horizon as the re-sim anchor + everything after) and `collectBelow` (no-carry-forward — query logs: drop everything strictly below the horizon). Proven via `tests/finalization.test.js` (21 unit) + `tests/finalization-harness.test.js` (4) + `test-harness/selftest-b9.mjs` (4, Node 12 too): rather than wall-clock heap snapshots (not deterministically AI-verifiable), bounding is proven by a `retainedCount` PLATEAU — identical retention at tick 600 and 1200 of a synthetic workload — against a GC-DISABLED CONTROL that grows ~linearly (1121→2241), proving the GC is load-bearing; plus the rollback-anchor invariant and sparse carry-forward. **Remaining for full done:** the engine actually releasing retained per-tick data on the finalization sweep rides **B-Integrate** (DECISIONS #32). Heap-growth verification under a real engine is a Phase C / B-Integrate item.

---

## Goal B10 — Random-Peer Bootstrap + Catching-Up State

**Why:** New joiners need full state. One peer always serving creates load imbalance and a single point of pressure. Joiners need to surface their busy state to the game.

**Deliverable:**
- Bootstrap request → random selection among eligible peers
- Bootstrap payload: state at grace-window edge + sparse input log since then + current per-participant input state
- New peer enters explicit "catching up" lifecycle state visible to Layer 3

**Success:**
- Over 100 joins, serving-peer distribution is approximately uniform
- Game-layer callbacks fire on entering and leaving the catching-up state

**Verify:**
- "Bootstrap on join" scenarios pass
- Statistical check: chi-square or eyeball on serving-peer histogram

**Status (core landed 2026-05-29):** pure `Bootstrap.js` — `eligibleServers` (LIVE-only filter; a still-catching-up peer can't serve a coherent present), `selectServingPeer(candidates, rng)` (joiner-side UNIFORM pick with an INJECTED seeded `rng`, index clamped into range so a misbehaving rng never over-indexes — over many joins yields a uniform serving-peer histogram with no load sink), `makeBootstrapPayload`/`validateBootstrapPayload` (the DECISIONS #18 three-piece payload: snapshot at the grace-window edge `currentTick - graceWindowTicks` carried BY REFERENCE + the since-edge sparse input log [entries STRICTLY after the edge enforced — at/before is already baked into snapshot+baseline] + the per-participant baseline intent IN EFFECT AT THE EDGE, concretizing #18's "current per-participant input state" as B9's anchored last-input-per-participant; frozen wrapper), `reconstructInputs` (a `SparseInputDecoder` per participant, baseline-seeded, log-only/never-heard = `null` passive per B3 → re-sim reproduces every participant's intent at every tick ≥ edge via hold-last), and a MONOTONIC Layer-3-visible `CatchUpTracker` (CATCHING_UP → LIVE once within `toleranceTicks`, never flaps back). The serving request is a single point-to-point RELIABLE send, never a broadcast (sparseness). Proven via `tests/bootstrap.test.js` (21 unit) + `tests/bootstrap-harness.test.js` (4, `BootstrapNode` server/joiner roles) + `test-harness/selftest-b10.mjs` (4, Node 12 too): UNIFORM SERVING-PEER DISTRIBUTION over 100 joins (chi-square < 13.28, df=4, p=0.01), SPARSE single-server contact (exactly one server served per join), the catching-up lifecycle (one enter/leave pair, ends LIVE, adopts the served snapshot at the captured present tick), and re-sim fidelity (baseline held + since-edge changes at the right ticks; never-heard ⇒ null). **Remaining for full done:** WHEN the engine requests a bootstrap, the real transport send, and the live re-simulation loop ride **B-Integrate** (DECISIONS #32).

---

## Goal B-Integrate — Assemble the v2 Engine (Big-Bang Integration)

**Why:** Phase B builds each protocol mechanism (B1–B10) as a pure, harness-validated core with its in-engine wiring deliberately deferred. Those cores must be assembled into the live engine before Phase C can migrate the examples (C2) or run against a real transport (C1). Decided BIG-BANG (`DECISIONS.md` #32): integrate once, after the cores stabilize, rather than folding each in incrementally — concentrate the composition risk in one validated step.

**Prerequisite — the seam (`KNOWN_ISSUES.md` #7), gating first step:**
- Cut the wall-clock + `requestAnimationFrame` coupling: `EasyMultiplayer` accepts an injected clock and a manual tick driver (as the harness nodes do), so the assembled engine is runnable — and testable — under `PeerHarness`. Nothing else in this goal can be validated until this lands.

**Deliverable:**
- A v2 engine composing `LocalIntent` (B3), `QueryContext` (B4), `HashWindow` (B5), `AcceptanceWindows` (B6), `DisconnectTracker` (B7; the B7.1 `DisconnectProbe` is superseded + deleted, see §B7 above), `Recovery` (B8), the B9 finalization/GC core, and the B10 bootstrap path — over an injected Transport (A5) and injected clock.
- The sparse-input + attendance protocol (B1/B2) replaces v1's per-tick input and flow-inferred liveness inside the engine.
- A real-engine `PeerHarness` `nodeFactory` so existing scenario IDs run end-to-end against the assembled engine, not just the per-goal harness nodes.

**Success:**
- A representative cross-section of the Phase B scenario catalog passes against the assembled engine under `PeerHarness`.
- No regression in the existing unit/harness suite.

**Verify:**
- Wire the real engine as a `PeerHarness` `nodeFactory`; re-run a chosen subset of B1–B10 scenarios end-to-end against it.

**Sequencing:** after B9 + B10 (all Phase B cores done), before Phase C. Absorbs every "in-engine wiring rides B9" deferral accumulated across B1–B8/B7.1.

**Status: DONE (2026-05-29).** `SimulationEngine.js` composes all ten cores over an injected Transport (A5) + injected clock, driven by a manual tick (the `KNOWN_ISSUES.md` #7 seam). Assembled across five validated layers, each with a test gate: L1 seam + sparse-input (B1/B3), L2 sim + rollback + query-freeze + hash-window (B4/B5), L3 acceptance/grace + disconnect + probe + recovery (B6/B7/B7.1/B8), L4 finalization GC + bootstrap-on-join (B9/B10), L5 end-to-end composition. Exposed as a `PeerHarness` `nodeFactory = (transport, opts) => new SimulationEngine(transport, opts)`; `tests/engine-l1..l4` + `tests/engine-integration.test.js` run a representative B1–B10 cross-section through it end-to-end, plus a Node-12 `test-harness/selftest-b-integrate.mjs`. Full suite 373/373 + all harness selftests green on Node 20 and Node 12. Three real-transport bootstrap gaps deferred to Phase C (`KNOWN_ISSUES.md` Risk #8); the `usedInputs`/"wait"-tier recovery wiring and the shipped `lagThresholdTicks`/`probeLeadTicks` defaults remain open (`KNOWN_ISSUES.md` #4).

---

# Phase C — Polish & Scale Validation

## Goal C1 — Real Transport Behind New Interface — ✅ DONE (2026-05-29)

**Why:** Validate that the Transport interface is actually general — not accidentally `MemoryTransport`-shaped.

**Deliverable:**
- `transports/TrysteroTransport.js` cleanly implementing the spec — ✅ refactored: trystero binding (`{ joinRoom, selfId }`) INJECTED via constructor; top-level https import removed; async `createTrysteroTransport()` factory is the single place that touches the remote URL (deferred dynamic `import()`), so the module loads under a plain Node ESM loader.
- A representative subset of scenarios runnable against it (the subset that doesn't need wall-clock control) — ✅ the full 12-point transport conformance suite runs against the REAL `TrysteroTransport` over a deterministic `FakeTrysteroNetwork` (NetworkSim/VirtualClock-backed) — `tests/trystero-transport.test.js` (17 tests, Vitest) + `test-harness/selftest-trystero.mjs` (16, Node-12).
- Fixed three conformance gaps in the adapter: pre-connect send/broadcast now THROW (#9), post-disconnect is a no-op, unknown/departed-peer send is a no-op (#6), and `{ reliable:true }` maps onto a distinct reliable trystero action label (#11, `TRYSTERO_LABELS`).
- `EasyMultiplayer.js` now constructs the default transport via `createTrysteroTransport()`.

**Success:**
- The chosen subset passes against real Trystero in a browser — ⏳ rides Goal C4 (browser, human-verified). The in-Node conformance proves the ADAPTER plumbing; real WebRTC / NAT / WebTorrent-swarm / wire-jitter / chunking behavior is browser-only.

**AI-testing limitations (CLAUDE.md §4):** the fake mesh cannot reproduce real WebRTC data channels, NAT traversal, or the signalling swarm. Conformance #11 (reliable-survives-loss) exercises the reliable-channel PLUMBING via the fake's loss model; real trystero channels are ALL reliable+ordered, so it is a wiring test, not a claim that real best-effort drops. The remote dynamic import in `createTrysteroTransport()` is never invoked in Node (it resolves an https ESM URL the loader rejects).

**Verify:**
- Browser test page (Goal C4) is green for the chosen subset — ⏳ pending C4.

---

## Goal C2 — Example Migration

**Why:** The pacman examples are how new users learn the API; they must demonstrate v2.

**Deliverable:**
- Pacman and graph-pacman migrated to the v2 API (`getLocalInputs`, `query(ctx, ...)`, etc.)

**Success:**
- Both examples play correctly in a browser with at least 2 peers
- Code is shorter or comparable in length to v1 versions

**Verify:**
- Load each example in two browser tabs against a real transport; basic play works

**Status (2026-05-29): facade rewire DONE; browser smoke pending (human, rides C4).**
The public facade (`EasyMultiplayer.js`) now runs on the assembled `SimulationEngine` instead of the
v1 SyncedScene/RollbackNetcode stack, with the four game services the engine lacks ported as a thin
Node-testable layer (`SeededRandom`, `EventSystem` via a new `onRollback` engine seam, `PresentationHints`,
deterministic in-step `playerJoined` / transport `playerLeft`). The public API is preserved, so neither
`examples/pacman.html` nor `examples/graph-pacman.html` needs source edits. The multiplayer-correctness core
the examples rely on is verified headlessly across 2 peers (`tests/easy-multiplayer-integration.test.js`, 5
tests: convergence under latency-driven rollback, RNG determinism, deterministic join, event cancel/confirm,
and the exact example API paths — defineInput + both query forms + manageState). Full suite 417/417 + Node-12
harness green. The literal `graph-pacman-game.js` imports remote `https://` modules + three and cannot load in
a plain Node loader; its in-browser play (pixels, real WebRTC, DOM keys, audio) is the human/real-env remainder,
folded into C4.

---

## Goal C3 — Scale Validation

**Why:** "Scales to many passive peers" must be measured, not asserted.

**Deliverable:**
- A scale scenario: 1 active + 100 passive participants on `MemoryTransport`
- Metrics output: per-peer message volume, per-peer memory, active peer's rollback frequency

**Success:**
- Passive peers send only attendance (no input messages)
- Active peer's rollback frequency does not depend on passive-peer count
- Memory per passive peer is bounded

**Verify:**
- Run the scale scenario; inspect metrics; confirm the three success bullets above

---

## Goal C4 — Browser-Based Multi-Instance Test Page

**Why:** Real WebRTC / NAT / browser quirks aren't visible to `MemoryTransport`.

**Deliverable:**
- A page that spins up N browser tabs / ZZITICKZZs wired to `TrysteroTransport`
- Same scenario runner as the harness, where feasible

**Success:**
- The subset of scenarios that don't require virtual-clock control passes in real browsers

**Verify:**
- Load the page in 2+ browsers; observe scenario results

---

## Goal C5 — Determinism Enforcement

**Why:** Floating-point and `Math.random` differences silently cause desyncs.

**Deliverable:**
- A documented strategy (override globals? seeded alternatives only? detector tool? all of the above?)
- Implementation of the chosen strategy

**Success:**
- A deliberately non-deterministic game (uses `Math.random` directly) is flagged or fails clearly, not silently

**Verify:**
- Run a determinism-violation scenario; observe explicit error or warning, not silent desync

---

## Goal C6 — Bus-Based Rollback Subsystem

**Why:** Long rollbacks currently freeze the visible simulation. Bus-based re-simulation keeps it smooth. This is `DECISIONS.md` #20.

**Deliverable:**
- Decoupled "displayed state" vs "authoritative latest state"
- Bus worker abstraction: travels from a start tick forward, snapshottable, supersede-able
- Policy decisions: parallel vs time-sliced, max concurrent buses, min tick gap between buses

**Success:**
- A scenario with a 200-tick rollback keeps rendering smoothly at 60 FPS (the draw loop keeps producing frames; visible state never freezes)
- Multiple rapid rollbacks spawn multiple buses without thrashing
- Eventually one bus reaches the present and visible state catches up to authoritative state

**Verify:**
- "Long rollback smoothness" scenario passes — measured by visible-state-tick-delta per wall-clock tick staying constant during the rollback
- Stress test: trigger 5 rollbacks within 30 ticks; system still converges

---

# Meta-Goals (Cross-Cutting)

## Goal M1 — Documentation Stays Current

**Why:** Docs drift faster than code if not enforced.

**Deliverable:**
- Each Phase B goal completion includes updates to `ARCHITECTURE.md` (current-state mapping), `DECISIONS.md` (any new decisions), `PROGRESS_LOG.md` (the session)

**Success:**
- A new contributor reading the docs after Phase B can build a mental model that matches the code

**Verify:**
- Spot-check: pick a random concept in `ARCHITECTURE.md`; grep for the corresponding code; the match should be obvious without reverse-engineering

---

## Goal M2 — Scenario Catalog Stays Comprehensive

**Why:** Scenarios are our spec; gaps in the catalog are gaps in the spec.

**Deliverable:**
- Each Phase B goal adds the scenarios that drove it to `TEST_SCENARIOS.md`
- Each Phase B goal in this `GOALS.md` references the scenario IDs it claims to satisfy

**Success:**
- At end of Phase B, ≥150 scenarios documented; ≥80% mechanically tested via the harness

**Verify:**
- Count scenarios; cross-check against goal completions

---

## Goal M3 — Public API Stability After Phase B

**Why:** Examples and downstream users need a stable target.

**Deliverable:**
- Versioned public API (`EasyMultiplayer` class signature, callback contracts)
- A migration note in `DECISIONS.md` for each breaking change made during Phase B

**Success:**
- After Phase B, no surface change for two consecutive minor versions without an explicit breaking-change decision

**Verify:**
- Spot-check: read `EasyMultiplayer.js` public method signatures; compare with examples — they should match
