# Known Issues — Easy-Multiplayer

## Pre-Redesign Tech Debt

Carries over from v1. Will be addressed opportunistically during Phase B.

1. ~~**Vitest does not run on Node 12**~~ RESOLVED under Goal A6 (2026-05-29). Root cause was the default `node` on PATH (system v12); Vitest 4 needs Node ≥18 (nvm has 20/24). Pinned with `.nvmrc` (20), `engines.node >=18`, `.npmrc engine-strict`, and a `pretest` guard that fails fast with a clear message on old Node. `npm test` runs 6 files / 98 tests on Node 20 & 24.
2. **Sparse real test coverage** beyond ~4 small test files
3. **Global state dependence** — `scene`, `camera`, `myId`, `window.dispose`
4. **Mixed class/prototype patterns** — inconsistent OOP style
5. **Hardcoded CDN URLs** for Trystero and SyncedClock — Trystero's URL is now isolated to a single deferred dynamic `import()` in `createTrysteroTransport()` (C1, 2026-05-29), so it no longer breaks Node module load and is a one-line change; SyncedClock's is unchanged.
6. **Limited error handling** — most functions assume valid inputs
7. **`RollbackNetcode.js` is 2111 lines** — likely needs decomposition during Phase B
8. **Tight Three.js coupling in optional modules** (ShownPlayer, SoundManager)

## v2 Redesign — Open Design Questions

Most questions raised during the 2026-05-25 planning session were answered (see `DECISIONS.md` #11–21). These remain:

1. **Bus-based rollback details (Phase C+):** parallel execution vs time-sliced, max concurrent buses, minimum tick gap between buses, selection policy when multiple buses converge on the present
2. **~~Bootstrap "catching-up" state surfacing:~~ RESOLVED (B10, 2026-05-29, DECISIONS #34f):** Layer 3 sees BOTH — a query-time status flag (`CatchUpStatus.CATCHING_UP` / `LIVE`, exposed via the tracker's `status`) AND a lifecycle callback pair (`onEnterCatchingUp` fired on join, `onLeaveCatchingUp` fired once the joiner reaches the present). The `CatchUpTracker` makes the transition MONOTONIC (enter once, leave once, never flaps back), so the game gets exactly one enter/leave pair per join. See `Bootstrap.js` / TEST_SCENARIOS S-021-03. (The engine-side wiring of *when* the callbacks fire in the live loop rides B-Integrate.)
3. **Determinism enforcement strategy:** override `Math.random` / `Date.now` globals? Provide seeded alternatives only? Build a determinism detector tool? Some combination?
4. **Default tunables:** sensible defaults for acceptance window, grace window, hash interval, attendance interval. Even tunable, the library should ship reasonable starting numbers. PARTIAL (B2, 2026-05-29): attendance defaults chosen — `attendanceIntervalMs=500`, `livenessTimeoutMs=2000`, `livenessSweepMs=interval` (HeartbeatLiveness; see PROTOCOL_SPEC). Acceptance/grace/hash-interval defaults chosen in B6 (DECISIONS #28: 200/300ms, 20-tick hash interval). NEW open tunable from B8: the lagging-peer-wake threshold (`lagThresholdTicks` — "several seconds behind") has no shipped default yet; the B8 tests pass a per-scenario value. Pick it alongside the B9/C work. (The former B7.1 `probeLeadTicks` tunable is GONE — the suspicion probe it parameterised was superseded + deleted 2026-06-05, replaced by beat forwarding; see DECISIONS #30. A new load-bearing precondition takes its place: `timeoutTicks` must span several gossip intervals so a forwarded beat propagates network-wide before any node times the player out — `DESIGN_PARTICIPATION.md` §6.2, to be hardcoded as a dev-mode assert.)
5. **~~Predicate context cloning efficiency:~~ RESOLVED (B4, 2026-05-29, DECISIONS #26d):** `QueryContext.cloneCtx` prefers the host `structuredClone` (Node ≥17 / modern browsers — preserves `undefined`, Date/Map/Set, key-order independent), with a hand-rolled cycle-safe recursive walker as the fallback for older hosts (the Node-12 selftest exercises the walker). Both are debug-mode-only; production does zero cloning. Functions/symbols are carried by reference.
6. ~~**Transport interface details:**~~ RESOLVED under Goal A1 (2026-05-28) — see `TRANSPORT_SPEC.md` + `DECISIONS.md` #22.
7. **Harness ↔ real engine wiring (A3 depends on A5):** RESOLVED by C2.2 (2026-05-29). `EasyMultiplayer` no longer hard-constructs `SyncedClock`/`requestAnimationFrame` for the simulation loop: it now backs onto `SimulationEngine`, which takes an injected clock (`options.clock`, defaulting to the new `RealtimeClock`) and self-drives its tick via `clock.schedule(tickMs, …)`; rAF is used only for the optional browser DRAW loop. The facade is therefore drivable headlessly with a `VirtualClock` + `MemoryTransport` and scripted inputs (see `tests/easy-multiplayer-integration.test.js`). The old "wire v1 now vs defer" decision is moot — the v2 engine exists and the facade runs on it.
8. **Canonical disconnect tick (from S-005-06):** TWO PARTS. **(local) RESOLVED (B7, 2026-05-29, DECISIONS #29):** the tick is `lastAttendanceTick(peer) + timeoutTicks` — deterministic from the shared stamp, not local detection time; a newer beat only moves it FORWARD (grow-only-max) and forces a disconnect-conditional rollback (restart at `old`) only if the peer already simulated past `old`. Pure core in `DisconnectTracker.js`; see TEST_SCENARIOS 13/14. **(sparse convergence) DECIDED (DECISIONS #30), mechanism REVISED 2026-06-05:** peers that heard DIFFERENT last beats converge via **beat FORWARDING (grow-only-max gossip)** + B5-desync / B8-recovery fallback carrying last-attendance-tick. This explicit sparse mechanism RETIRES the old #17 "no special algorithm" stance. Eventual agreement holds within a connected component, not instantaneously/globally. **SUPERSEDED:** the original FAST-PATH was a relevance-gated *pull-on-suspicion probe* (Goal B7.1, `DisconnectProbe.js`, "attendance never proactively forwarded"). It is DELETED — code + tests (`disconnect-probe*.test.js`, `engine-l3-probe.test.js`, `selftest-b7.1.mjs`) removed, engine unwired — because a 2-trip probe over the same reliable transport can never rescue a case 1-trip reliable gossip can't, and being one-shot per `(playerId, tickY)` a single dropped correction caused a false disconnect (`DESIGN_PARTICIPATION.md` §6.2). Beat forwarding itself (the `relevantPlayers` forward-gate) is design-stage (§6.1). The SLOW-PATH fallback primitive landed in B8 (2026-05-29) and is RETAINED as the backstop: the `Recovery.js` state transfer carries per-participant `lastAttendanceTicks` and `mergeLastAttendanceTicks` folds them grow-only-max, so a recovery transfer reconciles a disconnect-tick disagreement (a stale transfer can never roll a attendance backward). The speculative-disconnect-rollback question (#10) and long-partition beyond-grace recovery remain.
9. **inputDelay ↔ acceptance-window composition (from S-003-07):** an intent delivered within the input-delay buffer needs no rollback; one delivered after needs one. Are inputDelay and the acceptance window measured in the same tick space, and against the receiver's tick, the sender's stamp, or arrival time? (S-002-05 hits the same ambiguity.)
10. **Speculative-disconnect rollback (from S-004-03):** during the attendance-timeout window a peer is "maybe gone." Does a receiver optimistically keep simulating its held input until the disconnect resolves, and roll the speculative disconnect back if attendance resume?
11. **Batched vs per-message rollback (from S-002-04):** when multiple corrections land in one processing batch, is rollback computed once to the earliest affected tick, or re-applied per message? Batching avoids redundant re-simulation.
12. **App traffic as liveness evidence (from S-005-04):** TRANSPORT_SPEC guarantees that *silence* of app messages must not drop a peer. The converse is unspecified: can app traffic (intents) keep a peer alive when the dedicated attendance fails, or is liveness strictly the heartbeat channel?
13. **~~Canonical default/neutral input (from S-003-05, S-003-03):~~ RESOLVED (B3, 2026-05-29, DECISIONS #25b/c):** the pre-history default is `null` (PASSIVE), and "passive" is full EXCLUSION from input-bearing logic — NOT a defined neutral input object. A never-heard-from participant is passive everywhere until its first intent arrives; the reconstructed value is literally `null`. A game MAY override `defaultIntent`, but all peers must agree. See LocalIntent.js / TEST_SCENARIOS S-003-03, S-003-05.
14. **Event semantics under rollback (from S-001-07):** on rollback, are events emitted in the discarded timeline retracted/un-emitted, or do listeners observe both timelines? Also: is the query set allowed to change over time, and if so are past finalized corrections retro-applied (S-002-07)?
15. **Synced-tick distributed EDGE cases (from DECISIONS #37 "STILL UNSOLVED"):** the opt-in `syncedTick` mode is proven only for late-join ordering (a founder anchors the frame). The hard sparse-P2P cases — simultaneous cold-start tie-break (EDGE 1), two synced groups merging and re-settling (EDGE 2), large recalibration without a backward tick lurch / state resync (EDGE 3/6), staleness self-demotion + forged-seniority defense (EDGE 5), and longest-PRESENT-wins vs oldest-time (EDGE 4) — are designed and worked in `research/synced-clock/` (see its `README.md`); executable specs are the EDGE blocks in `tests/synced-tick-distributed-spec.test.js`.
16. **Scrap non-finalized desync HEALING? (from DECISIONS #39):** a non-finalized checkpoint differing while both peers used the SAME relevant inputs means the game step itself is non-deterministic; a state transfer cannot fix it (re-running identical inputs re-diverges on the same frame — only finalization, which freezes one history, resolves it). DETECTION still has diagnostic value (surface a non-determinism warning to the dev), but the repair path may be DROPPED entirely. Decide before building the #39 healing protocol.
17. **~~Which frame hashes to broadcast?~~ DECIDED (2026-06-02, DECISIONS #39):** finalized CHECKPOINT hashes (checkpoint ≤ grace horizon) are ALWAYS broadcast; non-finalized checkpoint hashes are sent ONLY behind the opt-in `sendNonFinalizedHashes` flag (off by default — they can reveal only a rollback-able / non-deterministic divergence a transfer cannot repair, so their value is unproven). Hashes exist only at checkpoints, so the "most recent finalized frame" each peer reports is the latest checkpoint at-or-before the horizon, NOT every frame past grace. Still open: whether the flagged non-finalized path proves useful enough to ever default on.
18. **Simplify HashWindow's confirmed/UNCONFIRMED input tracking? (from DECISIONS #39):** `compareHashWindows` carries each `usedInput` flagged confirmed/unconfirmed. The user argues an unconfirmed input is DERIVABLE from the last confirmed value plus "silence = no change" (updates communicate only changes), so the explicit unconfirmed track may be redundant. Revisit when implementing the #39 delta-sync classification.
19. **Re-request cadence for lost recovery requests (from DECISIONS #39):** the loser must re-request full state on a rate-limited timeout if a `stateChallenge`/`stateTransfer` is lost (pinned RED by `tests/recovery-lost-request.test.js`). Exact cadence is unspecified — "implement something that makes sense"; pick alongside the other B8 defaults (#4).

## Risks

1. **Fragile CDN dependencies** — if CDN URLs change, imports break silently. MITIGATED for trystero (C1, 2026-05-29): the remote URL lives in exactly ONE place — `createTrysteroTransport()`'s deferred dynamic `import()` in `transports/TrysteroTransport.js` — so a URL change is a single-line fix and never breaks module load (only a runtime browser failure surfaced at connect).
2. **No versioning** — no semver yet
3. **JS determinism is hard** — floating point across browsers, Object iteration order, etc.
4. **P2P full-mesh scaling** — passive-participation gains assume the transport scales; full mesh past ~20 peers needs relay/SFU support in the transport layer
5. **The bus design is unproven** — keeping visible state behind authoritative state for several ticks may produce its own UX problems we won't see until tested
6. **Test harness fidelity gap** — `MemoryTransport` won't catch real WebRTC / NAT / browser quirks; real-transport scenarios in Phase C are essential
7. **Predicate freezing relies on dev discipline** — even with debug-mode guards, devs running only in production mode can ship mutation bugs that desync silently
8. **Bootstrap-on-join is harness-complete, not transport-complete (B-Integrate L4, 2026-05-29):** the `SimulationEngine` B10 wiring is validated on the deterministic latency-0 shared-clock harness (round trip instantaneous; joiner tick-aligned). Three gaps need the real transport (Phase C) and are NOT yet wired — adding them now would be untested speculative code: (a) an input arriving DURING the bootstrap round trip is dropped, not buffered/replayed; (b) the joiner adopts the served *present* tick and so trails the live network by the round-trip latency (no faster-than-realtime convergence loop); (c) server selection uses the raw transport peer set with no LIVE-eligibility filter (`eligibleServers`) and no request retry, so picking a still-catching-up peer (which correctly declines) leaves the joiner waiting, and a transiently-empty peer set makes it self-found prematurely. An independent sub-agent review of the L4 wiring confirmed these are sound/harmless under the harness model and found NO bug that fires there. Resolve alongside the real-engine integration.

## Open Questions (Product)

1. Distribution model? (npm package? CDN? copy-paste?)
2. Minimum viable public API surface?
3. State serialization — dev-defined or auto-detected?
