Helm Deployment
The gb10-provision Helm chart deploys a PXE provisioning stack consisting of dnsmasq, nginx, and a provision-api service, designed specifically for bare-metal GB10 onboarding 1. Because PXE-booting devices lack an IP address and require direct LAN access, all pods utilize hostNetwork: true to bind directly to the physical LAN interface rather than using ClusterIP services 2. The architecture relies on a shared lan-setup initContainer to dynamically discover the live interface IP at runtime, ensuring that services bind to the correct address even after network changes or reboots 3.
Chart Structure and Configuration
Section titled “Chart Structure and Configuration”The chart is defined in Chart.yaml as an application type with version 0.1.0 and app version 1.0.0 1. Configuration is managed via values.yaml, which requires the lanIface (the physical interface name) to be set explicitly, as there is no default 2. The lanIp defaults to 10.88.0.1 but is primarily used in “direct” mode to assert the IP on the interface.
Key configuration values include:
provisionPort: Defaults to8088, serving as the single source of truth for the API’s bind port, container port, and nginx upstream.nginxHttpPort: Defaults to80, the port nginx serves HTTP content on.dataPath: Defaults to/data/provision, a HostPath volume pre-populated by the CLI.dnsmasq.mode: Can bedirect(full DHCP on isolated LAN) orproxy(proxy-DHCP on shared L2 switch).
Network Initialization and IP Discovery
Section titled “Network Initialization and IP Discovery”To handle dynamic IP addressing without baking IPs into manifests, the chart uses a shared lan-setup initContainer defined in _helpers.tpl 3. This initContainer runs in every pod and writes the live bind IP to a shared emptyDir volume at /run/lan/ip.
The behavior of lan-setup depends on the dnsmasq.mode:
- Direct Mode: The initContainer asserts the configured
lanIpon the interface usingNET_ADMINcapabilities, then publishes that IP. - Proxy Mode: The initContainer waits for the interface to acquire any IPv4 address via DHCP lease, then publishes that live IP.
The main containers (nginx, provision-api) mount this volume read-only and read /run/lan/ip at startup to bind to the correct address 4 5.
Component Deployments
Section titled “Component Deployments”The nginx deployment serves static assets (boot chain, images, repos) and proxies dynamic requests to the API 4. It uses a ConfigMap to store the nginx configuration template, which includes ${LAN_IP} as a placeholder. At container start, envsubst replaces ${LAN_IP} with the live IP from /run/lan/ip before starting nginx. The deployment uses a Recreate strategy to ensure only one pod binds the LAN ports at a time. A checksum annotation on the pod template triggers a rollout if the nginx configuration content changes.
DNSmasq
Section titled “DNSmasq”DNSmasq handles DHCP and TFTP services 6. It binds to the specified lanIface and excludes the loopback interface.
- Direct Mode: Acts as the primary DHCP server, handing out addresses from
dhcpRangeand booting EFI clients via TFTP. - Proxy Mode: Acts as a proxy-DHCP, providing PXE boot information without handing out IPs, coexisting with an existing DHCP server.
DNSmasq requires NET_ADMIN and NET_RAW capabilities. It also uses the Recreate strategy.
Provision API
Section titled “Provision API”The provision-api is a FastAPI/uvicorn service that generates cloud-init data and handles heartbeat/client registration 5. It is not exposed directly; nginx proxies requests for /cloud-init/, /heartbeat, /clients, and /onboard.service to it 7. The API binds to the live IP read from /run/lan/ip and the port defined in provisionPort 5. It also uses the Recreate strategy.
Data Flow and Boot Process
Section titled “Data Flow and Boot Process”The provisioning stack supports a fully offline PXE boot process for GB10 devices . The flow is as follows:
- PXE Boot: The GB10 broadcasts a DHCP request. DNSmasq responds with boot information (shim image) .
- TFTP/HTTP: The client downloads the signed shim via TFTP, then GRUB loads the kernel and initrd via HTTP from nginx .
- Dynamic Server Resolution: GRUB resolves the provisioning server dynamically using
${net_default_server}. - Onboarding: On first boot, the onboard service reads the persisted server address to communicate with the
provision-api.
apiVersion: v2
name: gb10-provision
description: PXE provisioning server for bare GB10 onboarding (dnsmasq + nginx + provision-api)
type: application
version: 0.1.0
appVersion: "1.0.0"
# gb10-provision Helm values
#
# All pods run with hostNetwork: true so DHCP/TFTP/HTTP bind on the
# physical LAN interface (10.88.0.1). A fresh PXE-booting GB10 has no
# IP yet, so ordinary ClusterIP services cannot route to it.
#
# Provisioning traffic is plain HTTP over the isolated 10.88.0.0/24 LAN
# (one host on, one host off). The initrd/subiquity environment has no
# trust anchor for a self-signed leaf, so no TLS is used here. The
# admin-side trust gate lives on :4443 (cookie auth, different iface).
# LAN interface NAME assigned by `sddc gb10 serve up` before Helm install.
# Required - no default. Every daemon binds to this iface ONLY so DHCP
# answers never leak to the AP/WAN.
lanIface: ""
# LAN interface IP assigned by `sddc gb10 serve up` before Helm install.
lanIp: "10.88.0.1"
# Internal port the provision-api (FastAPI/uvicorn) listens on, reached only via
# nginx proxy_pass. Single source of truth (both containerPort and --port). NOT
# 8080 - that collides with meeting-scribe's host-server fallback on this box
# (hostNetwork pods bind real host ports). `serve up` overrides via --set.
provisionPort: 8088
# Port nginx serves static + proxied content on (HTTP only).
nginxHttpPort: 80
# HostPath pre-populated by `sddc gb10 serve up` before Helm install.
dataPath: "/data/provision"
dnsmasq:
image: "sddcinfo/gb10-dnsmasq:dev"
# "direct": full DHCP server on an isolated 10.88.0.1/24 LAN (two GB10s
# back-to-back). "proxy": proxy-DHCP on a shared L2 switch - no address
# handout, coexists with the existing DHCP, only answers PXE boot info.
mode: "direct"
dhcpRange: "10.88.0.100,10.88.0.200,12h" # used in direct mode
proxySubnet: "" # used in proxy mode, e.g. 192.168.8.0 (set by `serve up`)
tftpRoot: "/data/provision/tftp"
{{/*
Expand the name of the chart.
*/}}
{{- define "gb10-provision.name" -}}
{{- .Chart.Name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{/*
Common labels
*/}}
{{- define "gb10-provision.labels" -}}
helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}
{{/*
Selector labels for a given component
*/}}
{{- define "gb10-provision.selectorLabels" -}}
app.kubernetes.io/name: {{ include "gb10-provision.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: {{ .component }}
{{- end }}
{{/*
lanInit - mode-aware initContainer that establishes the LAN-IP precondition and
publishes the *live* bind address to a shared volume (/run/lan/ip) for the main
(host-networked) container to read. Runs on every pod (re)start, so addressing
self-heals across reboots AND across an iface-IP change (the NM dispatcher rolls
the pods; the fresh initContainer re-discovers and republishes - see serve.py).
direct: idempotently (re)assert {{ .Values.lanIp }}/24, then publish it.
Needs NET_ADMIN (mutates host networking) - granted ONLY here.
proxy: wait for the iface to have ANY IPv4, then publish whatever it actually
is (NOT a baked value) so nginx/api bind the current lease, not a
stale rendered IP. Unprivileged.
Reuses .Values.dnsmasq.image (has iproute2). Pods are already hostNetwork:true,
so the initContainer shares the host netns. Mounts the shared lan-ip emptyDir.
*/}}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ .Release.Name }}-nginx-conf
namespace: {{ .Release.Namespace }}
labels:
{{- include "gb10-provision.labels" . | nindent 4 }}
data:
# ${LAN_IP} is NOT a Helm value - it's an envsubst placeholder filled at
# container start from /run/lan/ip (the live iface IP published by lan-setup),
# so nginx binds whatever address the box currently has, with no baked IP. The
# config body lives in the gb10-provision.nginxConf helper so the deployment
# can checksum the exact rendered content (below) and roll on any change.
nginx.conf.template: |
{{- include "gb10-provision.nginxConf" . | nindent 4 }}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Release.Name }}-nginx
namespace: {{ .Release.Namespace }}
labels:
{{- include "gb10-provision.labels" . | nindent 4 }}
app.kubernetes.io/component: nginx
spec:
replicas: 1
# hostNetwork singleton: the new pod can't bind the LAN ports while the
# old one holds them, so replace (not roll) on upgrade.
strategy:
type: Recreate
selector:
matchLabels:
app.kubernetes.io/name: {{ include "gb10-provision.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: nginx
template:
metadata:
annotations:
# Roll the pod whenever the rendered nginx config changes (a new
# location, header, or port) - without this a config-only `serve up`
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Release.Name }}-provision-api
namespace: {{ .Release.Namespace }}
labels:
{{- include "gb10-provision.labels" . | nindent 4 }}
app.kubernetes.io/component: provision-api
spec:
replicas: 1
# hostNetwork singleton: the new pod can't bind the LAN ports while the
# old one holds them, so replace (not roll) on upgrade.
strategy:
type: Recreate
selector:
matchLabels:
app.kubernetes.io/name: {{ include "gb10-provision.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: provision-api
template:
metadata:
labels:
app.kubernetes.io/name: {{ include "gb10-provision.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: provision-api
spec:
hostNetwork: true
initContainers:
{{- include "gb10-provision.lanInit" . | nindent 8 }}
containers:
- name: provision-api
image: {{ .Values.provisionApi.image }}
ports:
- containerPort: {{ .Values.provisionPort }}
protocol: TCP
env:
- name: PORT
value: {{ .Values.provisionPort | quote }}
- name: DATA_PATH
value: {{ .Values.dataPath | quote }}
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Release.Name }}-dnsmasq
namespace: {{ .Release.Namespace }}
labels:
{{- include "gb10-provision.labels" . | nindent 4 }}
app.kubernetes.io/component: dnsmasq
spec:
replicas: 1
# hostNetwork singleton: the new pod can't bind the LAN ports while the
# old one holds them, so replace (not roll) on upgrade.
strategy:
type: Recreate
selector:
matchLabels:
app.kubernetes.io/name: {{ include "gb10-provision.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: dnsmasq
template:
metadata:
labels:
app.kubernetes.io/name: {{ include "gb10-provision.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: dnsmasq
spec:
# hostNetwork: binds DHCP (67/udp) and TFTP (69/udp) on the physical
# LAN interface so a PXE-booting GB10 (no IP yet) can reach them.
hostNetwork: true
{{- if .Values.nodeName }}
nodeSelector:
kubernetes.io/hostname: {{ .Values.nodeName | quote }}
{{- end }}
initContainers:
{{- include "gb10-provision.lanInit" . | nindent 8 }}
containers:
- name: dnsmasq
image: {{ .Values.dnsmasq.image }}
securityContext:
capabilities:
location /casper/ { alias /data/provision/www/casper/; autoindex on; }
location /oemdata/ { alias /data/provision/www/oemdata/; autoindex on; }
location /grub/ { alias /data/provision/www/grub/; autoindex on; }
location /repos/ { alias /data/provision/repos/; autoindex on; }
location /wheels/ { alias /data/provision/wheels/; autoindex on; }
location /images/ { alias /data/provision/images/; autoindex on; }
location /bin/ { alias /data/provision/bin/; autoindex on; }
# 100%-offline caches served on the LAN (no internet on the target).
location /hf/ { alias /data/provision/hf/; autoindex on; }
location /mise/ { alias /data/provision/mise/; autoindex on; }
location /apt/ { alias /data/provision/apt/; autoindex on; }
location /fwupd/ { alias /data/provision/fwupd/; autoindex on; }
location /manifest.yaml { alias /data/provision/manifest.yaml; }
# Proxy dynamic cloud-init + heartbeat to provision-api. Forward the client's
# Host (the address it used to reach us) so the API renders provisioning_server
# dynamically - without this, proxy_pass rewrites Host to the upstream and the
# API would echo a fixed IP. ($host is an nginx var; envsubst's ${LAN_IP}
# allow-list leaves it untouched.)
proxy_set_header Host $host;
location /cloud-init/ { proxy_pass http://${LAN_IP}:{{ .Values.provisionPort }}/cloud-init/; }
location /heartbeat { proxy_pass http://${LAN_IP}:{{ .Values.provisionPort }}/heartbeat; }
location /clients { proxy_pass http://${LAN_IP}:{{ .Values.provisionPort }}/clients; }
location /onboard.service { proxy_pass http://${LAN_IP}:{{ .Values.provisionPort }}/onboard.service; }
}
}
{{- end }}