Building the Bletchley Cluster: Installing Talos Linux on TuringPi 2
Complete guide to installing Talos Linux v1.12.4 on TuringPi 2 RK1 modules — VLAN configuration, HA control plane, and every command along the way.
Introduction
Three weeks ago I wrote about planning this reinstallation. Today I'm documenting the actual installation — every command, every error, every decision. The cluster is running. Here's exactly how it happened.
🏠 This is part of the Homelab Journey series - building a production Kubernetes cluster from scratch.
Other posts in this series:
- My TuringPi Cluster Hardware
- Installing Talos Linux - Learning from My First Attempt
- Building the Bletchley Cluster: Installing Talos Linux on TuringPi 2 (you are here)
- Upgrading Talos Linux Nodes
What I'm Building
The cluster is named bletchley — four RK1 modules in a TuringPi 2, running Talos Linux v1.12.4 on Kubernetes v1.35.0.
| Node | Role | IP | Storage |
|---|---|---|---|
| rock1 | Control Plane | 10.0.140.11 | 32GB eMMC + 250GB NVMe |
| rock2 | Control Plane | 10.0.140.12 | 32GB eMMC + 250GB NVMe |
| rock3 | Control Plane | 10.0.140.13 | 32GB eMMC + 250GB NVMe + 2x 900GB SATA |
| rock4 | Worker | 10.0.140.14 | 32GB eMMC + 250GB NVMe |
Key design decisions:
- Nodes live on VLAN 140 (
10.0.140.0/24) — isolated from the rest of my network - VIP
10.0.140.10floats across control plane nodes for HA API access - Talos installs to eMMC, leaving NVMe and SATA free for Longhorn storage
- DHCP reservations give fixed IPs without static configuration in Talos
allowSchedulingOnControlPlanes: true— small cluster, no reason to waste resources
Phase 0: Preparation
Before touching the nodes, a few things need to be in place on the network and on the machine I'll be managing the cluster from.
Network
Required:
- VLAN 140 (optional) — I isolate the cluster nodes on a dedicated VLAN, but this isn't required. If you're running a flat network, the nodes will simply stay on your management network and you can skip the VLAN configuration in the patch files. If you do use a VLAN: the switch port connected to the TuringPi needs to be configured as a trunk port, carrying both the untagged management VLAN and tagged VLAN 140. Nodes boot untagged (picking up a temporary IP on the management network), receive their config, and then come up on VLAN 140. Without a trunk port, the node will disappear from the network after the config is applied and never come back.
- DHCP server — required regardless of whether you use a VLAN. I run DHCP on VLAN 140 with reservations mapping each node's MAC address to a fixed IP. On a flat network the setup is the same, just without the VLAN scope. The reservations are what matter — Talos config stays simple because DHCP always hands out the same address to the same node.
- VIP
10.0.140.10excluded from the DHCP pool. The VIP floats between control plane nodes and must never be assigned dynamically to anything else. - Internet access from VLAN 140 — the nodes need to reach DNS, HTTPS (for pulling container images), and NTP on first boot.
Optional:
- DNS entries — not required, but I prefer connecting by name rather than IP. I created
rock1-4.vluwte.nlpointing to the node IPs andbletchley.vluwte.nlpointing to the VIP. This also prepares the ground for SSL certificates later. - NTP server — Talos defaults to public NTP pools. I run my own NTP server (
ntp.luwte.net) and configured the nodes to use it. If you don't have an internal NTP server, simply leave this out of the patch files and Talos will use the defaults. - SSL certificates — not needed for installation, but worth planning for. Having proper DNS entries in place now means you can add certificates later without reconfiguring the cluster. I'll cover this in a future post.
Laptop
Install talosctl and kubectl:
# macOS
brew install siderolabs/tap/talosctl
brew install kubectl
# Verify
talosctl version --client
# Client: Tag: v1.12.4
kubectl version --client
# Client Version: v1.35.0
Both are needed — talosctl for talking to Talos nodes, kubectl for talking to Kubernetes once the cluster is up.
Create the working directory where all cluster files will live:
mkdir -p ~/talos-cluster/bletchley
cd ~/talos-cluster/bletchley
Phase 1: Building the Talos Image
Talos images aren't one-size-fits-all. The RK1 modules need a specific ARM64 image with the right hardware support and the system extensions I'll need later.
I used Talos Image Factory to build a custom image with two extensions:
- iscsi-tools — required for Longhorn storage
- util-linux-tools — for Longhorn volume trimming via fstrim
- nfsd — for future NFS exports from the SATA drives
The resulting schematic ID is d7a56218964f9ec22ae62c243a60d23a76853a82dc435eec34ed1cb2b5aabfe3. I'm noting this here because it's how you reproduce this exact image — if something breaks six months from now and I need to reflash, I come back to this ID.
# Download the compressed image (laptop)
curl -L -o metal-arm64.raw.xz \
"https://factory.talos.dev/image/d7a56218964f9ec22ae62c243a60d23a76853a82dc435eec34ed1cb2b5aabfe3/v1.12.4/metal-arm64.raw.xz"
# Decompress for CLI flashing (BMC requires uncompressed)
xzcat metal-arm64.raw.xz > metal-arm64.raw
# Copy to TuringPi BMC
scp metal-arm64.raw root@turingpi:/mnt/sdcard/talos-1.12.4-with-nfsd+util-linux-tools+iscsi/
Why decompress? The TuringPi BMC CLI (tpi flash) requires a raw uncompressed image. The web UI accepts .xz but the CLI doesn't. I learned this the first time around.
Phase 2: Flashing the Nodes
With the image on the BMC's SD card, flashing is straightforward. I used the CLI for nodes 1-3 and the web UI for node 4 (just to document both methods).
CLI Flashing (Nodes 1-3)
SSH into the BMC first:
ssh root@turingpi
Then for each node:
# Power off the node (if running)
tpi power off -n 1
# Flash the image
tpi flash -l -i /mnt/sdcard/talos-1.12.4-with-nfsd+util-linux-tools+iscsi/metal-arm64.raw -n 1
# Power on
tpi power on -n 1
# Watch the UART to confirm boot
tpi uart -n 1 get
The -l flag is "local" — reads from the BMC filesystem rather than fetching over the network. Flashing takes a couple of minutes per node.
What I saw in UART after boot:
[talos] entering maintenance mode
That's the Talos maintenance mode — the node is running, waiting for configuration. It picks up a temporary untagged IP from DHCP (mine were 192.168.0.110-113).
Web UI Flashing (Node 4)
Navigate to https://turingpi.luwte.net/ → Flash Node tab → select Node 4 → fill in the file path and optionally the SHA-256 for verification, then click Install OS:

The web UI uploads the .xz file to the BMC first. It decompresses on the fly — no need to decompress manually like the CLI method:

Once the upload is complete, the BMC verifies the SHA-256 checksum and writes the uncompressed image to the node's eMMC. This is the longer of the two phases:

After flashing completes, the node is powered off. To power it back on, click Edit first — the power toggles are read-only until edit mode is active:

Verifying All Nodes Are Ready
After flashing all four nodes:
| Node | Maintenance Mode IP | Status |
|---|---|---|
| rock1 | 192.168.0.110 | Maintenance ✓ |
| rock2 | 192.168.0.111 | Maintenance ✓ |
| rock3 | 192.168.0.112 | Maintenance ✓ |
| rock4 | 192.168.0.113 | Maintenance ✓ |
Node 3 showed a SATA handshake error during boot — it recovered immediately. It's the SATA controller initialising the two 900GB drives. Not a problem.
Phase 3: Generating Configuration
Generating the Base Configs
talosctl gen config bletchley https://10.0.140.10:6443 \
--output-dir ~/talos-cluster/bletchley
This generates three files:
controlplane.yaml— base control plane configurationworker.yaml— base worker configurationtalosconfig— credentials fortalosctlto connect to the cluster
The URL https://10.0.140.10:6443 is the VIP — this is baked into the cluster's TLS certificates, so it's important to get this right from the start.
The Control Plane Patch
The base configs need customising for my setup. Create cp.patch.yaml:
machine:
install:
disk: /dev/mmcblk0
time:
servers:
- ntp.luwte.net
network:
interfaces:
- interface: end0
vlans:
- vlanId: 140
dhcp: true
vip:
ip: 10.0.140.10
cluster:
allowSchedulingOnControlPlanes: true
Why each setting:
disk: /dev/mmcblk0— explicitly install to eMMC, not NVMe. Without this, Talos might pick the NVMe and I'd lose my Longhorn storage disk.time.servers— my internal NTP server. Talos defaults to public NTP pools; I want time sync going to my own server.interface: end0— the RK1's ethernet interface. The boot logs showedend0, noteth0.vlanId: 140— put the node on VLAN 140 with DHCP. My switch has DHCP reservations that give each node a fixed IP.vip.ip: 10.0.140.10— the floating VIP shared across control plane nodes.allowSchedulingOnControlPlanes: true— four nodes total, I want workloads running everywhere.
The Worker Patch
Create worker.patch.yaml:
machine:
install:
disk: /dev/mmcblk0
time:
servers:
- ntp.luwte.net
network:
interfaces:
- interface: end0
vlans:
- vlanId: 140
dhcp: true
Same as control plane but without the VIP — workers don't participate in VIP management.
A Note on IPv6
The nodes pick up IPv6 addresses automatically (2a10:3781:4bc9:...). If your network doesn't have IPv6 routing on this subnet, you'll see NTP timeout errors like this in the boot logs:
time query error with server "2a10:3781:4bc9:0:92ec:77ff:fe13:a988": i/o timeout
This is harmless — Talos falls back to IPv4. Disabling IPv6 in the patch is an option if it bothers you, but it resolves itself.
Phase 4: Applying Configuration
Configure talosctl Endpoints
The generated talosconfig has credentials but no endpoints — the nodes didn't have their final IPs when the config was generated. Set them now:
export TALOSCONFIG=~/talos-cluster/bletchley/talosconfig
talosctl config endpoint 10.0.140.11 10.0.140.12 10.0.140.13
talosctl config node 10.0.140.11
Apply to the Control Plane Nodes
Apply to each node using the maintenance mode IP (the temporary untagged address):
# rock1
talosctl apply-config --insecure \
--nodes 192.168.0.110 \
--config-patch @cp.patch.yaml \
--file controlplane.yaml
# rock2
talosctl apply-config --insecure \
--nodes 192.168.0.111 \
--config-patch @cp.patch.yaml \
--file controlplane.yaml
# rock3
talosctl apply-config --insecure \
--nodes 192.168.0.112 \
--config-patch @cp.patch.yaml \
--file controlplane.yaml
Expected behaviour: The command sends the config and then the connection drops with a timeout or graceful_stop error. This is normal — the node received the config, removed its untagged IP, and came back up on VLAN 140. You'll never see a clean success response.
error applying new configuration: rpc error: code = Unavailable desc = closing transport due
to: connection error ... received prior goaway: code: NO_ERROR, debug data: "graceful_stop"
That error means success.
What Happens During Boot
Watching the UART during rock1's first boot with the new config shows the network transition clearly:
[talos] removed address 192.168.0.110/24 from "end0"
[talos] created new link ... "end0.140", "kind": "vlan"
[talos] assigned address "10.0.140.11/24" ... "link": "end0.140"
[talos] setting hostname ... "hostname": "rock1", "domainname": "vluwte.nl"
[talos] created route ... "gateway": "10.0.140.1" ... "link": "end0.140"
The node drops off the untagged network, creates a VLAN subinterface, and comes back on 10.0.140.11. The DHCP reservation kicks in and it gets exactly the IP I expected.
The initial NTP lookup failure (server misbehaving) is also normal — it happens during the brief window when DNS resolvers are switching from the DHCP-provided defaults (1.1.1.1, 8.8.8.8) to my internal servers (10.0.0.4, 10.0.0.13). It resolves itself.
Apply Worker Config to rock4
talosctl apply-config --insecure \
--nodes 192.168.0.113 \
--config-patch @worker.patch.yaml \
--file worker.yaml
Phase 5: Bootstrapping etcd
With all four nodes configured and running, etcd is waiting on every node. I can see this in the UART logs:
etcd is waiting to join the cluster, if this node is the first node in the cluster,
please run `talosctl bootstrap` against one of the following IPs:
[10.0.140.11 ...]
Apply config to all nodes before bootstrapping. Bootstrap tells one node to initialise a new etcd cluster. If other control plane nodes aren't ready yet, they'll fail to join.
Before bootstrapping I ran a health check to confirm all three control plane nodes were in the expected state:
igor@granite ~ % talosctl -n 10.0.140.11 health
discovered nodes: ["10.0.140.11" "10.0.140.12" "10.0.140.13"]
waiting for etcd to be healthy: ...
waiting for etcd to be healthy: 3 errors occurred:
* 10.0.140.11: service "etcd" not in expected state "Running": current state [Preparing] Running pre state
* 10.0.140.12: service "etcd" not in expected state "Running": current state [Preparing] Running pre state
* 10.0.140.13: service "etcd" not in expected state "Running": current state [Preparing] Running pre state
This is exactly what I wanted to see. Two things to confirm before proceeding: all three nodes are discovered (10.0.140.11, 10.0.140.12, 10.0.140.13), and etcd is in Preparing state on all of them — meaning they're waiting for bootstrap, not stuck or erroring.
Then bootstrap:
igor@granite ~ % talosctl bootstrap -n 10.0.140.11
igor@granite ~ %
No output — just the prompt returning immediately. That's the expected success response. The bootstrap request was sent to rock1 and etcd will now initialise in the background.
Now watch it take effect:
igor@granite ~ % talosctl -n 10.0.140.11 health
discovered nodes: ["10.0.140.10" "10.0.140.12" "10.0.140.13"]
waiting for etcd to be healthy: ...
waiting for etcd to be healthy: OK
waiting for etcd members to be consistent across nodes: ...
waiting for etcd members to be consistent across nodes: OK
waiting for etcd members to be control plane nodes: ...
waiting for etcd members to be control plane nodes: OK
waiting for apid to be ready: ...
waiting for apid to be ready: OK
waiting for all nodes memory sizes: ...
waiting for all nodes memory sizes: OK
waiting for all nodes disk sizes: ...
waiting for all nodes disk sizes: OK
waiting for no diagnostics: ...
waiting for no diagnostics: OK
waiting for kubelet to be healthy: ...
waiting for kubelet to be healthy: OK
waiting for all nodes to finish boot sequence: ...
waiting for all nodes to finish boot sequence: OK
waiting for all k8s nodes to report: ...
waiting for all k8s nodes to report: can't find expected node with IPs ["10.0.140.12" ...]
waiting for all k8s nodes to report: OK
waiting for all control plane static pods to be running: ...
waiting for all control plane static pods to be running: OK
waiting for all control plane components to be ready: ...
waiting for all control plane components to be ready: can't find expected node with IPs ["10.0.140.10" "10.0.140.11" ...]
waiting for all control plane components to be ready: can't find expected node with IPs ["10.0.140.12" ...]
waiting for all control plane components to be ready: OK
waiting for all k8s nodes to report ready: ...
waiting for all k8s nodes to report ready: OK
waiting for kube-proxy to report ready: ...
waiting for kube-proxy to report ready: OK
waiting for coredns to report ready: ...
waiting for coredns to report ready: OK
waiting for all k8s nodes to report schedulable: ...
waiting for all k8s nodes to report schedulable: OK
A few things to note in this output. The discovered nodes now shows 10.0.140.10 (the VIP) instead of 10.0.140.11 — rock1 is now reachable via the VIP, which is correct. The can't find expected node errors are transient — Kubernetes was still registering nodes at that point and resolved itself within seconds. Every check ends with OK, which is what matters.
Phase 6: Verifying the Cluster
Get kubeconfig
The cluster is running but I can't talk to it with kubectl yet — for that I need a kubeconfig file. Talos generates this from the cluster itself, pulling the certificates and endpoint information that kubectl needs to authenticate and connect. I pull it directly from rock1 and store it in the same directory as the other cluster files:
talosctl kubeconfig ~/talos-cluster/bletchley/kubeconfig -n 10.0.140.11
export KUBECONFIG=~/talos-cluster/bletchley/kubeconfig
The export sets the environment variable for the current session so kubectl knows which config to use. Once this is added to ~/.zshrc (as covered in the "What's in the Files" section), this won't be needed manually again.
Check Nodes
kubectl get nodes
NAME STATUS ROLES AGE VERSION
rock1 Ready control-plane 8m37s v1.35.0
rock2 Ready control-plane 8m31s v1.35.0
rock3 Ready control-plane 9m2s v1.35.0
rock4 Ready <none> 2m20s v1.35.0
All four nodes Ready. Kubernetes v1.35.0.
Check System Pods
igor@granite bletchley % kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-7859998f6-68v26 1/1 Running 0 9m23s
kube-system coredns-7859998f6-844kv 1/1 Running 0 9m23s
kube-system kube-apiserver-rock1 1/1 Running 0 8m50s
kube-system kube-apiserver-rock2 1/1 Running 0 8m53s
kube-system kube-apiserver-rock3 1/1 Running 0 9m25s
kube-system kube-controller-manager-rock1 1/1 Running 2 (9m50s ago) 8m50s
kube-system kube-controller-manager-rock2 1/1 Running 2 (9m44s ago) 8m53s
kube-system kube-controller-manager-rock3 1/1 Running 2 (9m41s ago) 9m25s
kube-system kube-flannel-2dz7z 1/1 Running 0 9m1s
kube-system kube-flannel-5l4vr 1/1 Running 0 2m44s
kube-system kube-flannel-bhzqr 1/1 Running 0 9m26s
kube-system kube-flannel-mg5jd 1/1 Running 0 8m55s
kube-system kube-proxy-jk8fv 1/1 Running 0 2m44s
kube-system kube-proxy-ntjrq 1/1 Running 0 8m55s
kube-system kube-proxy-q4xqm 1/1 Running 0 9m26s
kube-system kube-proxy-vnzt2 1/1 Running 0 9m1s
kube-system kube-scheduler-rock1 1/1 Running 3 (9m33s ago) 8m50s
kube-system kube-scheduler-rock2 1/1 Running 2 (9m44s ago) 8m53s
kube-system kube-scheduler-rock3 1/1 Running 2 (9m41s ago) 9m25s
igor@granite bletchley %
Everything running. A quick sanity check on what's here: three kube-apiserver, kube-controller-manager and kube-scheduler pods — one per control plane node. Four kube-flannel and kube-proxy pods — one per node including the worker. Two coredns pods for DNS within the cluster.
The kube-controller-manager and kube-scheduler pods show 2-3 restarts each — this is normal during bootstrap while they wait for etcd and the API server to stabilise.
Check Node Details
kubectl get nodes tells me the nodes are Ready, but it doesn't tell me much else. kubectl describe nodes gives a much richer picture — labels, taints, resource capacity, what's running on each node, and annotations set by Talos and Flannel. I use it as a final sanity check to confirm the cluster is configured the way I intended, not just that it's running.
A few things worth noting:
Extensions are confirmed on every node:
extensions.talos.dev/iscsi-tools=v0.2.0
extensions.talos.dev/nfsd=v1.12.4
extensions.talos.dev/util-linux-tools=2.41.2These annotations confirm the custom image from Phase 1 was applied correctly on every node. The Image Factory schematic ID is also annotated on each node — a built-in audit trail that ties back to exactly what was built and when. If I need to reflash six months from now, this tells me which schematic to use.
Resources per node:
- 8 CPUs (7950m allocatable)
- ~8GB RAM (~7.3GB allocatable)
- 110 pods capacity
The difference between total and allocatable is what Talos and the Kubernetes system components reserve for themselves. 7950m out of 8000m CPUs and ~7.3GB out of ~8GB RAM is reasonable — the overhead is small.
Taints on control plane nodes:
node-role.kubernetes.io/control-plane:NoSchedule
Normally this taint prevents regular workloads from being scheduled on control plane nodes. Because allowSchedulingOnControlPlanes: true is set in the patch, Talos automatically adds a corresponding toleration that overrides this — so workloads can land on all four nodes regardless.
Flannel and pod networking:
flannel.alpha.coreos.com/public-ip: 10.0.140.11
flannel.alpha.coreos.com/backend-type: vxlan
Flannel is the Container Network Interface (CNI) — the component responsible for pod-to-pod networking across the cluster. Without a CNI, pods on different nodes can't talk to each other. Talos installs Flannel by default as part of bootstrap.
It operates in VXLAN mode here, which means it creates an overlay network that tunnels pod traffic between nodes using UDP encapsulation on top of the existing node network. Each node gets its own subnet from the PodCIDR range (10.244.x.0/24 by default), and Flannel routes traffic between those subnets. The public-ip annotation shows which node IP Flannel is using as the tunnel endpoint — confirming it picked up the correct VLAN 140 address and not the old management IP.
Powering Down the Cluster
Talos doesn't have a traditional OS you can just pull the plug on — shut down nodes gracefully to avoid etcd corruption and filesystem issues.
Since allowSchedulingOnControlPlanes: true is set, workloads can run on all four nodes, so drain all of them first:
# 1. Drain all nodes (evict workloads gracefully)
kubectl drain rock4 --ignore-daemonsets --delete-emptydir-data
kubectl drain rock3 --ignore-daemonsets --delete-emptydir-data
kubectl drain rock2 --ignore-daemonsets --delete-emptydir-data
kubectl drain rock1 --ignore-daemonsets --delete-emptydir-data
--ignore-daemonsets skips DaemonSet pods like Flannel and kube-proxy that run on every node by design and can't be moved. --delete-emptydir-data allows eviction of pods using temporary local storage — that data is lost, which is fine during a shutdown.
# 2. Shut down nodes via Talos (worker first, VIP holder last)
# First, check which node currently holds the VIP:
talosctl -n rock1.vluwte.nl,rock2.vluwte.nl,rock3.vluwte.nl,rock4.vluwte.nl get addresses | grep 10.0.140.10
rock1.vluwte.nl network AddressStatus end0.140/10.0.140.10/32 1 10.0.140.10/32 end0.140
The node name in the first column is the current VIP holder — shut that one down last.
talosctl shutdown -n 10.0.140.14
talosctl shutdown -n 10.0.140.13
talosctl shutdown -n 10.0.140.12
talosctl shutdown -n 10.0.140.11 # VIP holder last
# 3. Power off the TuringPi board via BMC
tpi power off -n 4
tpi power off -n 3
tpi power off -n 2
tpi power off -n 1
To bring it back up, power on in any order — Talos and etcd handle the rest automatically:
tpi power on -n 1
tpi power on -n 2
tpi power on -n 3
tpi power on -n 4
What's in the Files
All cluster access lives in two files — talosconfig for talosctl and kubeconfig for kubectl. If I lose these, or need to manage the cluster from another machine, these are the files to copy over. I'm backing them up somewhere safe; controlplane.yaml and talosconfig contain cluster secrets and should be treated like passwords.
~/talos-cluster/bletchley/
├── controlplane.yaml # Control plane base config (contains cluster secrets)
├── worker.yaml # Worker base config
├── talosconfig # talosctl credentials
├── kubeconfig # kubectl credentials
├── cp.patch.yaml # Control plane customisation patch
└── worker.patch.yaml # Worker customisation patch
Using the Files from Another Machine
To manage the cluster from a different machine, copy both credential files over and point the tools at them:
scp ~/talos-cluster/bletchley/talosconfig user@othermachine:~/talos-cluster/bletchley/
scp ~/talos-cluster/bletchley/kubeconfig user@othermachine:~/talos-cluster/bletchley/
Then set the environment variables and configure the endpoints:
export TALOSCONFIG=~/talos-cluster/bletchley/talosconfig
export KUBECONFIG=~/talos-cluster/bletchley/kubeconfig
talosctl config endpoint 10.0.140.11 10.0.140.12 10.0.140.13
talosctl config node 10.0.140.11
Setting Environment Variables at Login
Rather than exporting these variables every session, I add them to my shell profile so they're set automatically at login.
In ~/.zshrc or ~/.bashrc:
# Talos / Kubernetes - bletchley cluster
export TALOSCONFIG=~/talos-cluster/bletchley/talosconfig
export KUBECONFIG=~/talos-cluster/bletchley/kubeconfig
Then reload the shell:
source ~/.zshrc # or source ~/.bashrc
Now kubectl get nodes and talosctl version work straight away in any new terminal session.
Lessons Learned
A hanging talosctl apply-config can be safely interrupted. When applying config to rock3, the command hung after the connection dropped. Rather than waiting for the timeout, I opened a second terminal and checked whether the node had come up correctly on VLAN 140:
talosctl -n rock1.vluwte.nl,rock2.vluwte.nl,rock3.vluwte.nl get members
Rock3 was listed with the right IP and hostname — the work was already done. A ^C to kill the hanging command was all that was needed. The apply itself takes only seconds; if the node looks right when you expect it to be ready, there's no need to wait on a timeout.
The graceful_stop error is success. Every time I applied a config, the connection dropped with a timeout. I kept second-guessing it. It's fine — the node is transitioning networks.
Apply all nodes before bootstrapping. I bootstrapped after applying rock1's config in an earlier attempt. The bootstrap failed because rock2 and rock3 weren't ready to join. The right approach is to wait until all control plane nodes are on VLAN 140 and showing etcd is waiting.
The NTP hostname lookup failure is transient. ntp.luwte.net failing to resolve during boot isn't a real problem — it happens in the 200ms window between DNS resolvers switching. Talos retries and succeeds once the internal DNS servers are configured.
The VIP appears in etcd member listings as an additional address for rock1. kubectl get nodes initially only showed the VIP for rock1's discovery, not 10.0.140.11 directly. This confused me briefly — it's just how VIP assignment works with Talos.
What's Next
The cluster is running but it has no persistent storage. All four nodes have 250GB NVMe drives sitting completely unused, and rock3 has two 900GB SATA drives. The next step is installing Longhorn to pool that storage into distributed persistent volumes.
After Longhorn: MetalLB for load balancing, then deploying the first real workloads.
Conclusion
Four RK1 modules, Talos v1.12.4, Kubernetes v1.35.0, HA control plane, running on VLAN 140. The bletchley cluster is live.
The whole process from flashing to kubectl get nodes took about four hours — including a failed bootstrap attempt and carefully documenting every step. Without documentation it would have taken less time but I'd have no idea how to reproduce it.
← Previous: Talos: First Attempt
→ Next: Upgrading Talos Linux Nodes
Questions or suggestions? Leave a comment below or reach out at igor@vluwte.nl.