Skip to content

Documentation Content

The documentation content serves as the primary informational resource for the project, detailing the history, workflows, and technical architecture of a fully automated, bare-metal Kubernetes platform provisioning system 1. This ecosystem is designed to take bare-metal servers from power-on to fully configured, production-ready Kubernetes nodes with zero manual intervention 2. The documentation is structured to guide users through the Ansible Automation Ecosystem, which is broken down into two main projects: the Provisioning Server and the Kubernetes Cluster.

The project began with a vision to create a home lab that rivals enterprise infrastructure by automating everything from power-on to application deployment 3. The development followed a six-phase timeline spanning twelve months:

  • Phase 1: Research & Planning (Months 1-2): This phase involved evaluating Kubernetes distributions, storage solutions (Ceph, GlusterFS, NFS), and networking solutions (Calico, Flannel, Cilium).
  • Phase 2: Hardware Assembly (Month 3): The team assembled four high-performance server nodes, configured a 10GbE network with a MikroTik switch, and installed NVMe drives.
  • Phase 3: Provisioning Automation (Months 4-5): This phase focused on building the PXE boot infrastructure, DHCP/TFTP/HTTP services, and cloud-init configurations to achieve zero-touch server provisioning.
  • Phase 4: Kubernetes Deployment (Months 6-8): The team built Ansible playbooks for Kubernetes installation, configured a high-availability control plane, and deployed Calico for pod networking.
  • Phase 5: Storage Layer (Months 9-10): This involved deploying the Rook operator and Ceph cluster, configuring storage pools, and implementing 3x replication.
  • Phase 6: Production Hardening (Months 11-12): The final phase implemented monitoring with Prometheus and Grafana, centralized logging with the ELK stack, and automated backups 4.

The infrastructure has achieved 99.9% uptime, completed over 100 successful deployments, and recorded zero data loss incidents 3. Future improvements include upgrading to 25GbE networking, implementing GitOps with ArgoCD or Flux, and adding a service mesh with Istio or Linkerd.

The system is composed of two primary Ansible projects that handle the lifecycle of the infrastructure. The first component is the Provisioning Server, a dedicated machine that provides network services such as DHCP, TFTP, and HTTP to automate the installation of the Ubuntu operating system on new machines 2. The second component is the Kubernetes Cluster, which consists of Ansible roles that configure the newly provisioned servers into a complete, production-ready cluster.

The following diagram illustrates the high-level control flow between the provisioning infrastructure and the target cluster nodes.

diagram

The documentation provides a comprehensive overview of the architecture, configuration, and execution steps involved in the system. Users are encouraged to use the sidebar navigation to explore different components, which includes detailed explanations of the architecture and configuration. The project emphasizes Infrastructure as Code, using Ansible to make the process repeatable and version-controlled 3. Key lessons learned include the importance of an incremental approach for focused learning and the value of documentation in solidifying understanding.