Testing & Benchmarks

The testing infrastructure for meetingscribe is designed to validate complex client-side state management and cross-window synchronization without requiring the heavy computational resources of the production environment. By utilizing a custom test harness that mounts the real WebSocket broadcasting router while stubbing the GPU-dependent backends, the suite ensures that the client-side logic for handling transcript events, speaker pulses, and reconnection logic is rigorously exercised. This approach allows for deterministic regression testing of specific UI bugs, such as grid clearing issues or language routing errors, within a standard CI environment that lacks GPU access.

Test Suite Structure

The primary browser tests are located in tests/browser/test_cross_window_sync.py and are marked with pytest.mark.browser to distinguish them from unit tests ¹. These tests utilize Playwright to simulate user interactions across multiple browser contexts, specifically focusing on the synchronization between the admin window and the pop-out transcript view.

The test suite is built around a custom live_meeting_server fixture. This fixture starts a background uvicorn server running a FastAPI application that mirrors the production server’s WebSocket behavior ². Crucially, this harness mounts the real view_broadcast router, ensuring that the WebSocket shape, journal replay logic, and connection registry are exercised exactly as they are in production ¹. However, it stubs the rest of the surface that the pop-out initialization touches, such as language lists and meeting status, to avoid dependencies on external services or hardware ².

Key test cases include:

Cross-window consistency: Verifying that two independent pop-out viewers ingest the same broadcast events and render identical segment IDs ³.
Speaker pulse regression: Ensuring that frequent speaker_pulse events do not clear the pop-out grid, a bug caused by control messages being incorrectly funneled into the segment store.
Reconnection logic: Confirming that a pop-out window reconnecting after a disconnect replays the journal to catch up with the admin view without duplicating segments ⁴.
Language routing: Validating that source and translation text appear in the correct columns for various language pairs (e.g., English-Japanese, English-German) ⁵.

Benchmark Harnesses

The provided source material does not contain specific benchmark harnesses for translation or speakerphone hardware. The testing strategy explicitly avoids using the real meeting_scribe application lifespan because it boots vLLM backends, which require GPUs that are not available in the CI environment ¹. Instead, the focus is on client-side state management and WebSocket handling, with backends stubbed to return static or simulated data ².

CI Pipeline Configuration

The CI pipeline is configured to run the browser tests using the custom harness described above. This configuration allows the tests to execute in environments without GPU support by bypassing the heavy ASR and translation backends ¹. The harness ensures that the critical path of the client-side JavaScript code, specifically scribe-app.js, is exercised by connecting to the same server instance and receiving broadcasts via the shared ws_connections set ⁶.

tests/browser/test_cross_window_sync.py L1-120 (showing 40 of 120)

"""Cross-window transcript sync (admin ↔ popout) - Playwright tests.

The bug class this catches:

  Brad opens the live meeting in the admin window AND the pop-out window
  via `?popout=view&test=1`. Admin shows the full transcript; popout shows
  fragments / nothing / wrong content. Server is broadcasting the same
  events to both - the divergence is in client-side WS handling and
  SegmentStore state.

  Most recently bit: the popout's view-WS handler had a catch-all `else`
  that funneled every non-segment control message (`speaker_pulse`,
  `seat_update`, etc.) into `store.ingest()`, which then fired listeners
  with `segment_id=undefined`, which CompactGridRenderer interpreted as
  "store cleared" - wiping the popout grid every 200 ms during a meeting.

  Admin doesn't have this problem because its audio-WS handler enumerates
  every control type explicitly. The bug lives in the seam, not the data.

What we test:

  1. Both contexts connect to the same /api/ws/view + show same segments.
  2. `speaker_pulse` ticks from the test harness DO NOT clear the popout
     grid - its child count must monotonically increase when transcript
     events arrive between pulses.
  3. After a popout WS disconnect+reconnect, the popout catches up to
     the same set of segment_ids the admin has, with no duplicates.
  4. Cross-language pair: same routing across (en, ja), (en, de),
     (en, fr) - guards against the same-script-router class.

Why a custom harness, not the real meeting_scribe app: the real lifespan
boots vLLM backends. CI doesn't have GPUs. We mount the *real* `view_broadcast`
router (so the WS shape, journal replay, and ws_connections registry are
exercised) and stub the rest of the surface the popout init touches.
"""

from __future__ import annotations

import json
import socket

tests/browser/test_cross_window_sync.py L121-240 (showing 40 of 120)

                "events": list(harness.events),
            }
        )

    @app.get("/api/meetings")
    async def meetings():
        return JSONResponse([])

    @app.get("/api/meeting/wifi")
    async def wifi():
        return JSONResponse({"available": False})

    @app.post("/api/diag/listener")
    async def diag_listener():
        return JSONResponse({"ok": True})

    # ── /api/ws/view - the route the popout connects to ──────────────
    # Reproduces the relevant behavior of meeting_scribe.ws.view_broadcast
    # without depending on the runtime.state singleton.
    @app.websocket("/api/ws/view")
    async def ws_view(websocket: WebSocket) -> None:
        await websocket.accept()
        harness.ws_connections.add(websocket)
        try:
            # Replay journal so late-joining clients catch up - same
            # contract as the real view_broadcast handler.
            for ev in list(harness.events):
                await websocket.send_text(json.dumps(ev))
            while True:
                # The popout pings periodically; we just drain.
                await websocket.receive_text()
        except Exception:
            pass
        finally:
            harness.ws_connections.discard(websocket)

    # ── /api/ws/audio - admin's audio WS, stubbed (no real ASR) ──────
    # Admin connects here when starting a meeting. We don't process
    # audio bytes - we just register the connection so it receives
    # broadcasts, then echo control messages.

tests/browser/test_cross_window_sync.py L481-600 (showing 40 of 120)

            )
            # Different speakers per segment so block-merging doesn't
            # collapse them into a single block; we want to observe each
            # segment in the rendered grid independently.
            seg["speakers"] = [{"cluster_id": i + 1, "source": "diarize"}]
            _broadcast(server, seg)

        _wait_until(page_a, "() => window._gridRenderer?._segmentMap?.size >= 3")
        _wait_until(page_b, "() => window._gridRenderer?._segmentMap?.size >= 3")

        ids_a = page_a.evaluate("() => Array.from(window._gridRenderer._segmentMap.keys()).sort()")
        ids_b = page_b.evaluate("() => Array.from(window._gridRenderer._segmentMap.keys()).sort()")

        assert ids_a == ids_b, (
            f"two popouts diverged on the same broadcast:\n  page_a = {ids_a}\n  page_b = {ids_b}"
        )
        assert len(ids_a) == 3, f"expected 3 segments, got {ids_a}"
    finally:
        ctx_a.close()
        ctx_b.close()


def test_speaker_pulse_does_not_clear_popout_grid(browser, live_meeting_server):
    """Regression for the popout-clear-on-pulse bug.

    speaker_pulse fires every 200 ms during a meeting. If the popout's
    catch-all WS branch funnels it through `store.ingest()` with no
    segment_id, CompactGridRenderer interprets the falsy id as "store
    cleared" and wipes the grid - so the popout only ever shows the
    sliver of utterances received between pulses.

    The fix lives in segment-store.js (early-return when no segment_id).
    This test fails BEFORE the fix and passes after.
    """
    server = live_meeting_server
    base = server["base_url"]

    popout_ctx = browser.new_context()
    popout_page = popout_ctx.new_page()

tests/browser/test_cross_window_sync.py L601-720 (showing 40 of 120)

    server = live_meeting_server
    base = server["base_url"]
    harness: _MeetingState = server["harness"]

    popout_ctx = browser.new_context()
    popout_page = popout_ctx.new_page()

    try:
        popout_page.goto(f"{base}/?popout=view&test=1", wait_until="domcontentloaded")
        _wait_until(popout_page, "() => !!window._gridRenderer")
        _wait_for_popout_ws_open(popout_page)

        # Land 2 segments - different speakers so blocks don't merge.
        for i in range(2):
            seg = _make_segment(
                segment_id=f"seg-pre-{i}",
                text=f"Pre-disconnect segment {i}.",
                start_ms=i * 2000,
                translation_text=f"切断前{i}",
            )
            seg["speakers"] = [{"cluster_id": i + 1, "source": "diarize"}]
            _broadcast(server, seg)

        _wait_until(
            popout_page,
            "() => window._gridRenderer?._segmentMap?.size >= 2",
        )

        # Force-close the popout's view-WS by hitting the test endpoint;
        # the popout's auto-reconnect kicks in.
        import urllib.request

        urllib.request.urlopen(
            urllib.request.Request(f"{base}/test/disconnect_all", method="POST"),
            timeout=2,
        ).read()

        # During the disconnect, broadcast 2 more segments. They land in
        # the harness journal but the popout is offline. allow_zero=True
        # because the popout is intentionally disconnected here.

tests/browser/test_cross_window_sync.py L721-740

        cols = popout_page.evaluate(
            """() => {
                const block = document.querySelector('#transcript-grid .compact-block');
                if (!block) return null;
                return {
                    a: block.querySelector('.compact-col-a')?.textContent || '',
                    b: block.querySelector('.compact-col-b')?.textContent || '',
                };
            }"""
        )
        assert cols is not None
        assert src_text.split(".")[0] in cols["a"], (
            f"source text not in column A for {source_lang}↔{target_lang}: colA={cols['a']!r}"
        )
        assert tgt_text.split(".")[0].split("。")[0] in cols["b"], (
            f"translation not in column B for {source_lang}↔{target_lang}: colB={cols['b']!r}"
        )
    finally:
        popout_ctx.close()

tests/browser/test_cross_window_sync.py L241-360 (showing 40 of 120)

        self._started.wait(timeout)
        deadline = time.monotonic() + timeout
        while time.monotonic() < deadline:
            try:
                urllib.request.urlopen(f"http://127.0.0.1:{self.port}/api/status", timeout=1)
                return
            except Exception:
                time.sleep(0.05)
        raise RuntimeError(f"live_meeting_server did not start on port {self.port}")

    def stop(self) -> None:
        if hasattr(self, "_server"):
            self._server.should_exit = True


@pytest.fixture
def live_meeting_server() -> Generator[dict[str, Any]]:
    """Start a live FastAPI app with stubbed backends + real WS routing.

    The popout and admin pages connect to the *same* server instance,
    receive broadcasts via the same `ws_connections` set, and exercise
    the same client-side scribe-app.js code path that runs in production.
    """
    harness = _MeetingState()
    app = _build_app(harness)
    thread = _ServerThread(app)
    thread.start()
    thread.wait_ready()
    try:
        yield {
            "base_url": f"http://127.0.0.1:{thread.port}",
            "harness": harness,
        }
    finally:
        thread.stop()
        thread.join(timeout=3.0)


# ── Helpers ──────────────────────────────────────────────────────────────