Setting Up Longhorn Storage on the Bletchley Cluster
Installing Longhorn distributed storage on a Talos Linux cluster: NVMe preparation, Helm install, and why the namespace label matters before anything else.
Introduction
With the cluster running and all four nodes upgraded with the right extensions, the next step was storage. Kubernetes itself doesn't manage persistent storage — it needs a storage layer to provision volumes for workloads. On the Bletchley cluster, that storage layer is Longhorn.
🏠 This is part of the Homelab Journey series - building a production Kubernetes cluster from scratch.
Other posts in this series:
- My TuringPi Cluster Hardware
- Installing Talos Linux: First Attempt
- Building Bletchley
- Building the Bletchley Cluster
- Setting Up Longhorn Storage on the Bletchley Cluster (you are here)
What is Longhorn?
Longhorn is a distributed block storage system for Kubernetes. It takes the local NVMe disk on each node and makes them available as a unified storage pool — when a workload needs persistent storage, Longhorn provisions a volume, replicates it across nodes, and presents it to Kubernetes through the standard CSI (Container Storage Interface).
The practical result: a pod running on any node can access its persistent data, even if the node it started on isn't the one the data originally landed on. If a node goes down, the volume stays accessible via its replica on another node.
I chose Longhorn over alternatives like Rook/Ceph because it's lighter, Kubernetes-native, and well-suited to a four-node homelab cluster. Ceph is powerful but designed for much larger deployments — it would be using a sledgehammer where Longhorn is the right tool.
What Was Already in Place
When Talos was installed on the Bletchley cluster, the image was built with three extensions that Longhorn requires: iscsi-tools, nfsd, and util-linux-tools. Without these, Longhorn won't function correctly — they need to be part of the Talos image before storage can be set up.
The second piece — which I'll cover in detail here — was preparing the NVMe disks and telling Talos how to expose them to pods.
Preparing the Nodes
The NVMe Mount
Talos is an immutable operating system. Pods can't freely access host paths the way they might on a traditional Linux system — you have to explicitly tell Talos which paths are allowed. Longhorn needs a dedicated directory on each node's NVMe disk to store volume data, and that path needs to be declared in the machine config.
First I confirmed the disk layout on the nodes:
talosctl -n rock1.vluwte.nl get disks
All four nodes showed the same setup. Trimming the output to the relevant lines:
NODE TYPE ID SIZE READ ONLY TRANSPORT MODEL
rock1.vluwte.nl Disk mmcblk0 31 GB false mmc
rock1.vluwte.nl Disk nvme0n1 250 GB false nvme KINGSTON SNVS250G
mmcblk0 — 31GB eMMC, where Talos lives. nvme0n1 — 250GB Kingston NVMe, for application data. Clean separation exactly as planned.
I created longhorn-patch.yaml to apply to all four nodes:
machine:
disks:
- device: /dev/nvme0n1
partitions:
- mountpoint: /var/mnt/longhorn
kubelet:
extraMounts:
- destination: /var/mnt/longhorn
type: bind
source: /var/mnt/longhorn
options:
- bind
- rshared
- rw
The disks section tells Talos to partition and mount the NVMe at /var/mnt/longhorn. The extraMounts section is what allows Longhorn's pods to access that path from inside their containers — and it's worth understanding what each option actually does.
bind creates a bind mount, making the host path visible inside the container at the same location. Without this, the container simply can't see /var/mnt/longhorn at all.
rw makes the mount read-write. Longhorn needs to write volume data there, so read-only would be pointless.
rshared is the interesting one. By default, Linux containers are isolated from their host's mount namespace — if something gets mounted inside a container, the host doesn't see it, and vice versa. rshared changes this to bidirectional propagation: mounts created inside the Longhorn container (such as when Longhorn attaches a volume to a pod) are visible on the host, and mounts created on the host are visible inside the container. The r prefix means this propagation applies recursively to all sub-mounts under the path, not just the top level.
This matters because Longhorn's instance manager creates block device mounts dynamically when volumes are attached to pods. Those mounts need to be visible to the host's kubelet — which is what actually hands the volume off to the workload pod. Without rshared, that handoff silently fails: Longhorn thinks the volume is attached, kubelet can't see it, and the pod sits waiting forever.
Applying the patch — but one node at a time. Machine config patches trigger a reboot, and if all nodes reboot simultaneously the VIP becomes unreachable. The sleep 300 gives each node five minutes to come back Ready before moving to the next:
for node in rock1 rock2 rock3 rock4; do
talosctl -n $node.vluwte.nl patch machineconfig --patch-file longhorn-patch.yaml
sleep 300
done
I learned this the hard way — I applied all four patches in quick succession without thinking, all nodes rebooted at once, and kubectl lost its endpoint until the VIP came back up. With no workloads running yet it was harmless, but once services are on the cluster it would be a brief outage. The loop with a sleep enforces the right behaviour automatically.
Tip: Run this in a second terminal while the loop is executing to watch each node drop out and come backReadyin real time:
You'll see the node status change toNotReadyas it reboots, then back toReadyonce it's up. A good way to confirm the sleep window is actually enough before the loop moves to the next node.
After all nodes were back up, I verified the mount:
talosctl -n rock1.vluwte.nl read /proc/mounts | grep longhorn
Output on all four nodes:
/dev/nvme0n1p1 /var/mnt/longhorn xfs rw,seclabel,relatime,inode64,logbufs=8,logbsize=32k,noquota 0 0
NVMe mounted, XFS formatted, read-write. Every node confirmed.
Installing Longhorn with Helm
Why Helm
Longhorn can be installed with a single kubectl apply command, but I chose Helm for the same reason I document everything: reproducibility. With Helm, the installation is defined in a longhorn-values.yaml file that lives in ~/talos-cluster/bletchley/ alongside the Talos config files. Anyone (including future me) can recreate the exact same setup by running helm install with that file. Upgrades are also cleaner — helm upgrade handles the diff rather than re-applying a raw manifest.
I also pinned the version explicitly. Without --version, Helm installs whatever is current at the time — fine until you need to reinstall six months later and the defaults have changed. Version 1.10.2 is pinned and documented. More on versioning and upgrade strategy in the upcoming deep dive post.
The values file — longhorn-values.yaml
defaultSettings:
defaultReplicaCount: 2
defaultDataPath: /var/mnt/longhorn
Two settings: replica count and data path. The data path /var/mnt/longhorn is Talos-specific — Longhorn's default assumes a standard Linux path that Talos doesn't use. The replica count of 2 is a deliberate choice; Longhorn's own best practices documentation recommends 2 replicas for homelab and resource-constrained clusters as the right balance between redundancy and storage efficiency. More on the reasoning behind that number in a planned future deep dive post.
The Talos namespace gotcha
Before running helm install, the longhorn-system namespace needs to exist and be labelled to allow privileged pods. Talos enforces strict pod security policies by default — Longhorn needs privileged access to manage host disks, and without this label the pods start but immediately fail.
kubectl create namespace longhorn-system
kubectl label namespace longhorn-system \
pod-security.kubernetes.io/enforce=privileged
This step is required regardless of install method. Helm can create the namespace with --create-namespace, but it can't apply the label as part of that — so the namespace must be created and labelled manually first.
The install
helm install longhorn longhorn/longhorn \
--namespace longhorn-system \
--version 1.10.2 \
--values longhorn-values.yaml
The install produced several warnings before completing. These are expected — each one is Kubernetes auditing a policy violation against the restricted standard, while allowing it through because the namespace is labelled privileged. They're not errors; they're the audit trail of the enforcement decision made when the namespace was labelled.
The warnings fall into four groups by component:
longhorn-manager — the most warnings, because this is the component that actually manages disks on each host. It runs as root, sets privileged=true, and mounts host paths including /dev, /proc, /etc, and /var/mnt/longhorn. All of these are required for low-level block device management and all of them violate restricted policy by design.
longhorn-ui — lighter violations. The UI doesn't touch disks at all; it just doesn't meet restricted's hardening requirements (non-root, capability dropping, seccomp profile). Not a security concern in practice.
longhorn-driver-deployer — installs the CSI driver onto each node, which requires elevated access. Runs explicitly as runAsUser=0 (root), which restricted prohibits.
longhorn-csi-plugin — similar profile to the driver deployer; needs host access to present volumes to pods via the CSI interface.
The pattern across all of them is the same: without the namespace label, these would have been admission rejections and no pods would have been created. With it, they're warnings and the install proceeds. The label is doing exactly the job it was set up to do.
The install completed cleanly:
NAME: longhorn
LAST DEPLOYED: Wed Feb 25 21:38:57 2026
NAMESPACE: longhorn-system
STATUS: deployed
REVISION: 1
DESCRIPTION: Install complete
Watching It Start
Longhorn doesn't come up instantly — there's a startup sequence as components initialise. Running kubectl -n longhorn-system get pods a few seconds after install showed everything beginning to pull and start:
engine-image-ei-843accdd-4ql2b 0/1 ContainerCreating 0 15s
longhorn-manager-9tgvz 2/2 Running 0 47s
longhorn-ui-7dd9bf7459-v99lr 1/1 Running 0 47s
About three minutes later, the full picture:
csi-attacher-7855ffbcd4-28m88 1/1 Running 0 106s
csi-provisioner-85c5fb855-4mkfj 1/1 Running 0 106s
csi-resizer-78565b658d-g2jxh 1/1 Running 0 106s
csi-snapshotter-7dffb49666-2lkj5 1/1 Running 0 106s
engine-image-ei-843accdd-4ql2b 1/1 Running 0 2m30s
instance-manager-17f25a4f51c41e1... 1/1 Running 0 2m
longhorn-csi-plugin-27rn7 3/3 Running 0 106s
longhorn-driver-deployer-5666c5969-pfwhw 1/1 Running 0 3m2s
longhorn-manager-9tgvz 2/2 Running 0 3m2s
longhorn-ui-7dd9bf7459-v99lr 1/1 Running 0 3m2s
Every component running. The restarts visible on some pods during startup are normal — components race to initialise and retry until their dependencies are ready.
Verification
With everything running, I verified Longhorn could actually see the storage on each node:
kubectl -n longhorn-system get nodes.longhorn.io
NAME READY ALLOWSCHEDULING SCHEDULABLE AGE
rock1 True true True 3m12s
rock2 True true True 2m8s
rock3 True true True 2m44s
rock4 True true True 2m49s
All four nodes ready and schedulable. Then the StorageClass:
kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
longhorn (default) driver.longhorn.io Delete Immediate true 3m39s
longhorn-static driver.longhorn.io Delete Immediate true 3m36s
Two storage classes created automatically. longhorn is the default — workloads can claim persistent volumes without specifying a storage class explicitly. longhorn-static is for pre-provisioned volumes, useful for restoring from backups. ALLOWVOLUMEEXPANSION: true means volumes can be resized after creation without recreating them.
What's Next
The Bletchley cluster now has a working distributed storage layer. Four nodes, 1TB of raw NVMe, 500GB usable with 2-replica protection.
Storage without a workload to use it is just infrastructure waiting for a purpose. The next post will be the first actual workload on the cluster — something that puts a persistent volume to use and proves the whole stack works end to end. A deep dive into the decisions behind this Longhorn installation — replica count, version pinning, and upgrade strategy — is planned as a future post once there's more real-world experience to draw from.
← Previous: Upgrading Talos
Questions or suggestions? Leave a comment below or reach out at igor@vluwte.nl.