Infrastructure & Deployment

The meetingscribe infrastructure layer manages the lifecycle of services on GB10 hardware, primarily targeting k3s-based deployments. It provides a unified interface for executing commands locally or via SSH, prefetching HuggingFace model weights to ensure offline pod availability, and verifying service health through HTTP polling. This section details the components responsible for container orchestration support, model caching, and system readiness checks.

Cluster Management and Command Execution

For container management, the LocalRunner exposes methods to interact with Docker, including docker_run, docker_stop, docker_remove, and docker_restart ¹. The docker_run method configures containers with specific defaults suitable for ML workloads, such as host networking, GPU access, and shared memory limits. The docker_restart method is specifically used to recover containers where the process is alive but the CUDA context is corrupted, offering a faster recovery than stopping and starting the container ².

HuggingFace Model Prefetching

The pull_models function in src/meeting_scribe/infra/containers.py is responsible for downloading HuggingFace model weights to the GB10’s shared host cache (/data/huggingface) ³. This pre-flight step ensures that offline k3s pods can find the required models.

For local execution, the function uses the huggingface_hub.snapshot_download API directly. This approach is chosen because the shipped hf CLI is unreliable for scripting due to environment path issues and exit code leaks. The legacy huggingface-cli is also avoided as it acts as a no-op shim that silently performs empty pulls. Any failure during the download raises an exception to prevent partial or empty pulls, which would cause pods to crash-loop with LocalEntryNotFoundError.

For remote SSH targets, the function executes the hf download command over SSH. This ensures the models are downloaded to the remote GB10 node rather than the local machine. The function manages the HF_TOKEN environment variable to handle gated repositories.

Service Health Checking

The check_service function handles the polling logic with configurable timeouts and retry intervals ⁴. It returns True if the service responds with a 200 status code. The check_all_services function checks multiple services concurrently using asyncio.gather ⁵. When wait=True, all services share a total timeout deadline, ensuring the total wait time is determined by the slowest service rather than the sum of all services.

If a service is healthy, the checker also attempts to retrieve the loaded model ID from the /v1/models endpoint ⁴. This information is included in the ServiceStatus dataclass.

Privileged Helper Daemon

The provided sources do not contain information about a privileged helper daemon for root-level operations. The LocalRunner and SSHRunner abstractions handle command execution, but the specific implementation of a privileged daemon is not described in the referenced files.

src/meeting_scribe/infra/local.py L1-120 (showing 40 of 120)

"""Local command runner - mirrors SSHRunner but executes via subprocess.

Used when meeting-scribe runs on the GB10 itself (the common dev setup).
"""

from __future__ import annotations

import logging
import subprocess

logger = logging.getLogger(__name__)


class LocalRunner:
    """Execute commands on the local host with the same surface as SSHRunner."""

    def __init__(self) -> None:
        self.node = None

    @property
    def ssh_target(self) -> str:
        return "local"

    def run(
        self,
        cmd: list[str],
        timeout: int = 30,
        check: bool = True,
    ) -> subprocess.CompletedProcess[str]:
        logger.debug("LOCAL: %s", " ".join(cmd))
        return subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            timeout=timeout,
            check=check,
        )

    def run_bg(self, cmd: list[str]) -> str:
        proc = subprocess.Popen(

src/meeting_scribe/infra/local.py L121-172 (showing 40 of 52)


    def docker_container_exists(self, name: str) -> tuple[bool, bool]:
        """Return (exists, running). Used by start_container() to pick
        between `docker start` (existing) and `docker run` (fresh)."""
        result = self.run(
            ["docker", "inspect", "--format", "{{.State.Running}}", name],
            timeout=10,
            check=False,
        )
        if result.returncode != 0:
            return (False, False)
        running = result.stdout.strip().lower() == "true"
        return (True, running)

    def docker_start(self, container_id: str) -> bool:
        result = self.run(
            ["docker", "start", container_id],
            timeout=30,
            check=False,
        )
        return result.returncode == 0

    def docker_remove(self, container_id: str, force: bool = True) -> bool:
        args = ["docker", "rm"]
        if force:
            args.append("-f")
        args.append(container_id)
        result = self.run(args, timeout=30, check=False)
        return result.returncode == 0

    def docker_restart(self, container_id: str, timeout: int = 30) -> bool:
        """Restart a running container in place (single docker restart).

        Used to recover a container whose process is alive but whose
        CUDA context has been corrupted (e.g. pyannote after concurrent
        calls wedged the GPU). Much faster than docker stop + up for
        a single container.
        """
        result = self.run(
            ["docker", "restart", "-t", str(timeout), container_id],

src/meeting_scribe/infra/containers.py L1-105 (showing 40 of 105)

"""HuggingFace model pre-fetch for the GB10 model stack (k3s-only).

The backends run as k3s pods (`helm/meeting-scribe`); this module keeps the one
pre-flight helper that isn't k3s's job - pulling HuggingFace model weights into
the shared host cache (`/data/huggingface`) so the offline pods find them. The
container-lifecycle helpers were removed with docker-compose (k3s owns the pods).
"""

from __future__ import annotations

import logging
import os
from pathlib import Path
from typing import TYPE_CHECKING

from meeting_scribe import paths

if TYPE_CHECKING:
    from meeting_scribe.infra.local import LocalRunner
    from meeting_scribe.infra.ssh import SSHRunner

    # Either runner works - both expose `run`, `rsync`, etc.
    Runner = LocalRunner | SSHRunner
else:
    Runner = object

logger = logging.getLogger(__name__)


def pull_models(
    ssh: Runner,
    model_ids: list[str],
    hf_cache_dir: str = str(paths.DEFAULT_HF_CACHE_DIR),
    *,
    hf_token: str | None = None,
) -> None:
    """Download models to the GB10's HuggingFace cache.

    For a LOCAL target the download runs in-process via
    ``huggingface_hub.snapshot_download`` - NOT the ``hf`` CLI. The shipped

src/meeting_scribe/infra/health.py L1-120 (showing 40 of 120)

"""Service health checking with retry and timeout.

Polls HTTP health endpoints until they respond or timeout is reached.
Pattern adapted from auto-sre's _wait_for_vllm().
"""

from __future__ import annotations

import asyncio
import logging
from dataclasses import dataclass

import httpx

logger = logging.getLogger(__name__)

# Default service ports (host networking, no port mapping)
SERVICE_PORTS = {
    "translation": 8010,
    "diarization": 8001,
    "tts": 8002,
    "asr": 8003,
}


@dataclass
class ServiceStatus:
    """Health status of a single service."""

    name: str
    url: str
    healthy: bool
    model: str | None = None
    error: str | None = None


async def check_service(
    url: str,
    *,
    timeout: float = 5.0,

src/meeting_scribe/infra/health.py L121-142

    shared ``total_timeout`` deadline. Total wait = max(slowest_service)
    instead of sum(all_services).

    Args:
        host: GB10 IP or hostname.
        ports: Override default service ports.
        wait: If True, wait for each service to become healthy.
        total_timeout: Max wait time (shared across all services when
                       wait=True).

    Returns:
        Dict mapping service name to ServiceStatus.
    """
    svc_ports = ports or SERVICE_PORTS

    tasks = [
        _check_one_service(name, port, host, wait=wait, total_timeout=total_timeout)
        for name, port in svc_ports.items()
    ]
    pairs = await asyncio.gather(*tasks)
    return {name: status for name, status in pairs}