Skip to content

Architecture Overview

The gb10-provision system is a self-contained, state-free PXE/onboarding control plane designed to bring a fresh NVIDIA GB10 (DGX Spark) appliance online over an isolated provisioning LAN 1. It operates as a set of container images and a Helm chart deployed on the appliance’s own k3s cluster. The architecture relies on an isolated-LAN security model where the API binds only to the provisioning interface and assumes the network is physically or logically isolated. The system is state-free, meaning no site inventory, credentials, MACs, or operator data are stored in the repository; all host-specific information is supplied at deploy time via Helm values or environment variables.

The system consists of three primary container images and a Helm chart that orchestrates them:

  • gb10-dnsmasq: Provides DHCP and TFTP/PXE services specifically for the isolated provisioning network.
  • gb10-provision-api: A FastAPI-based control plane that serves cloud-init and autoinstall configurations. It tracks per-client onboarding state and renders user-data.
  • Helm Chart (gb10-provision): Wires the dnsmasq, nginx, and the API components together.

The provisioning process involves the interaction between the client appliance and the control plane components. The gb10-dnsmasq component handles the initial network discovery via DHCP and TFTP/PXE. The gb10-provision-api serves the necessary configuration files, such as cloud-init and autoinstall seeds, to the client. The Helm chart ensures these components are correctly integrated within the k3s environment.

diagram

The architecture is guided by three key design principles:

  1. State-free: The system does not maintain any persistent state regarding site inventory, credentials, MAC addresses, or operator data. All host-specific data is provided dynamically at deployment time.
  2. Isolated-LAN Security: The API binds exclusively to the provisioning interface (e.g., --host $(LAN_IP)) and never to 0.0.0.0. It runs unprivileged and assumes the provisioning network is isolated, not intended for exposure on a routable network.
  3. Strict Templating: Jinja rendering utilizes StrictUndefined, ensuring that any missing variable causes a loud failure rather than generating a broken autoinstall seed.