Architecture Overview

The gb10-provision system is a self-contained, state-free PXE/onboarding control plane designed to bring a fresh NVIDIA GB10 (DGX Spark) appliance online over an isolated provisioning LAN ¹. It operates as a set of container images and a Helm chart deployed on the appliance’s own k3s cluster. The architecture relies on an isolated-LAN security model where the API binds only to the provisioning interface and assumes the network is physically or logically isolated. The system is state-free, meaning no site inventory, credentials, MACs, or operator data are stored in the repository; all host-specific information is supplied at deploy time via Helm values or environment variables.

Core Components

The system consists of three primary container images and a Helm chart that orchestrates them:

gb10-dnsmasq: Provides DHCP and TFTP/PXE services specifically for the isolated provisioning network.
gb10-provision-api: A FastAPI-based control plane that serves cloud-init and autoinstall configurations. It tracks per-client onboarding state and renders user-data.
Helm Chart (gb10-provision): Wires the dnsmasq, nginx, and the API components together.

Data Flow and Interaction

The provisioning process involves the interaction between the client appliance and the control plane components. The gb10-dnsmasq component handles the initial network discovery via DHCP and TFTP/PXE. The gb10-provision-api serves the necessary configuration files, such as cloud-init and autoinstall seeds, to the client. The Helm chart ensures these components are correctly integrated within the k3s environment.

Design Principles

The architecture is guided by three key design principles:

State-free: The system does not maintain any persistent state regarding site inventory, credentials, MAC addresses, or operator data. All host-specific data is provided dynamically at deployment time.
Isolated-LAN Security: The API binds exclusively to the provisioning interface (e.g., --host $(LAN_IP)) and never to 0.0.0.0. It runs unprivileged and assumes the provisioning network is isolated, not intended for exposure on a routable network.
Strict Templating: Jinja rendering utilizes StrictUndefined, ensuring that any missing variable causes a loud failure rather than generating a broken autoinstall seed.

README.md L1-49 (showing 40 of 49)

# gb10-provision

> **Disclaimer: unofficial and unsupported.** Provided for testing and
> evaluation only, on an "AS IS" basis, with no warranty and no support. Not
> affiliated with or endorsed by Dell. See [DISCLAIMER.md](DISCLAIMER.md).

Wiki: https://sddcinfo.github.io/provisioning/


A self-contained, state-free PXE/onboarding **control plane** for bringing a fresh
NVIDIA GB10 (DGX Spark) appliance online over an isolated provisioning LAN.

It ships as container images + a Helm chart deployed on the appliance's own k3s:

```
containers/
  gb10-dnsmasq/ # DHCP + TFTP/PXE for the isolated provisioning network
  gb10-provision-api/ # FastAPI control-plane: serves cloud-init/autoinstall,
                       # tracks per-client onboarding state, renders user-data
helm/
  gb10-provision/ # Helm chart wiring dnsmasq + nginx + the API together
```

## Design

- **State-free.** No site inventory, credentials, MACs, or operator data live in this
  repo. Everything host-specific is supplied at deploy time via Helm values / env.
- **Isolated-LAN security model.** The API binds only to the provisioning interface
  (`--host $(LAN_IP)`, never `0.0.0.0`) and runs unprivileged. It assumes the
  provisioning network is physically/logically isolated; it is **not** intended to be
  exposed on a routable network.
- **Strict templating.** Jinja rendering uses `StrictUndefined`, so a missing variable
  fails loudly rather than shipping a broken autoinstall seed.

## Deploy

```bash
# Import the images into the appliance's k3s containerd, then:
helm upgrade --install gb10-provision ./helm/gb10-provision \
  --namespace gb10-provision --create-namespace \