Installing Talos Linux on TuringPi RK1: Learning from My First Attempt
Documentation isn't optional. How I lost access to my working Talos cluster and why I'm reinstalling from scratch with proper documentation.
Introduction
In November 2025, I successfully installed Talos Linux on my TuringPi cluster. The installation worked. The cluster came up. Everything seemed fine.
Then I lost access.
Not because the cluster failed - but because I didn't document how I was connecting, from where, or which commands needed to run on which machine. When I came back to the cluster a few weeks later, I couldn't remember my workflow. Keys? Configuration files? Which terminal was SSH'd where?
This is the problem with not documenting as you go.
So now I'm starting over. Same hardware, same process, but this time with complete documentation of every step, every connection, every command location. This post captures what I learned from my first attempt and sets up for a proper, reproducible installation.
🏠 This is part of the Homelab Journey series - building a production Kubernetes cluster from scratch.
Other posts in this series:
- TuringPi Hardware
- Installing Talos Linux on TuringPi RK1: Learning from My First Attempt (you are here)
- Building the Bletchley Cluster
What Happened the First Time
November 2025: Following an excellent tutorial from xphyr.net, I installed Talos Linux 1.11.5 on all four RK1 modules.
The good:
- Installation process worked smoothly
- Cluster formed successfully
- No major technical issues
The problem:
- I didn't document my connection workflow
- Lost track of where I was running commands (laptop? BMC? SSH session?)
- Couldn't easily get back into the cluster
- No clear path to resume work
After 25+ years managing infrastructure, I should know better: documentation isn't optional. When you're learning a new system (Talos is very different from traditional Linux), you need to write down everything.
Why I'm Starting Over
Rather than trying to reverse-engineer what I did three months ago, I'm wiping the cluster and reinstalling from scratch - this time with complete documentation.
What I'll capture this time:
- Where each command runs (laptop, TuringPi BMC, specific node)
- How connections are established
- Which files live where
- The complete workflow from start to finish
- Troubleshooting steps if something breaks
This makes the cluster reproducible. If I need to rebuild it, add nodes, or help someone else, the documentation exists.
About the Tutorial I'm Following
Credit where it's due: I'm following the excellent guide from xphyr.net: Kubernetes using TuringPi, Talos and RK1
Why this tutorial works:
- Written specifically for TuringPi + RK1 + Talos combination
- Covers the complete process
- Includes storage setup (Longhorn)
- Real-world configuration examples
My adjustments:
- Version updates (newer Talos versions available)
- Additional documentation of connection workflows
- My specific network configuration
- Lessons learned from first attempt
If you're following along, I highly recommend reading the original tutorial alongside this post.
Version Strategy
One challenge with any infrastructure tutorial: versions change constantly.
What I used in November 2025:
- Talos Linux: 1.11.5
- Kubernetes: v1.34.0 (bundled with Talos)
- Longhorn: 1.10.1 (storage system)
What I'm using for the reinstallation (February 2026):
- Talos Linux: 1.12.4 (current as of February 2026)
- Kubernetes: (bundled with Talos 1.12.4)
- Longhorn: (will check for latest stable version)
What you should do:
- Check for the current Talos version at https://factory.talos.dev/
- Use the latest stable version available when you install
- The Image Factory makes version selection flexible
Version compatibility notes:
- Talos 1.11.5 worked perfectly with RK1 modules in November
- Talos 1.12.4 should have improvements and bug fixes
- The installation process remains consistent across versions
- ARM64 support for RK1 is stable
I'll document the specific versions I use in the actual installation post, but understand that when you read this, newer versions will likely exist. The process remains the same - just update the version number in the Image Factory.
Prerequisites
Before starting, you need:
Hardware
- TuringPi 2 board with RK1 modules installed
- Network connectivity to TuringPi BMC (Baseboard Management Controller)
- Power supply connected
Software on Your Laptop
- talosctl - Talos CLI tool for cluster management
- kubectl - Kubernetes CLI (for later)
- helm - Package manager for Kubernetes (for Longhorn)
- SSH client - For BMC access
Network Requirements
- TuringPi BMC accessible on your network
- DHCP available for RK1 nodes (or plan for static IPs)
- One IP reserved for Kubernetes API VIP (Virtual IP)
Knowledge
- Basic understanding of Kubernetes concepts
- Comfortable with command line
- Understanding of YAML configuration files
Understanding Talos Linux
Before diving into installation, it's worth understanding what makes Talos different.
What is Talos?
Talos Linux is a Linux distribution designed specifically for running Kubernetes. Unlike traditional Linux distributions:
- Immutable: No SSH access, no package manager, no manual changes
- API-driven: Everything configured via YAML and API calls
- Minimal: Only what's needed to run Kubernetes
- Secure by default: No shell access reduces attack surface
Why Talos for This Cluster?
Advantages:
- Perfect for learning Kubernetes "the right way"
- Forces infrastructure-as-code practices
- Minimal resource overhead
- Designed for ARM64 (RK1 modules)
Challenges:
- Can't SSH into nodes to troubleshoot
- Different mental model from traditional Linux
- Learning curve for configuration
For a learning cluster, these challenges are actually features - they force you to understand Kubernetes rather than falling back to traditional Linux debugging.
The Installation Plan
Here's what we'll do in the next post when I actually perform the reinstallation:
Phase 1: Prepare the Image
- Use Talos Image Factory to create RK1-specific image
- Include required system extensions (iSCSI, NFS, Linux-tools)
- Download the image
Phase 2: Flash the Nodes
- Transfer image to TuringPi BMC
- Flash each RK1 module via BMC
- Power on and verify boot
Phase 3: Generate Configuration
- Create configuration patch for our setup
- Generate Talos configs for control plane and worker
- Store these files safely (they're needed for future operations)
Phase 4: Apply Configuration
- Apply config to each node
- Bootstrap the first control plane node
- Wait for cluster formation
Phase 5: Verify and Access
- Check cluster health
- Generate kubeconfig for kubectl access
- Verify all nodes are ready
Phase 6: Storage Setup
- Install Longhorn for distributed storage
- Configure storage classes
- Test persistent volumes
Key Lessons from First Attempt
Before I actually redo the installation, here are the lessons learned that will shape the new documentation:
1. Document Connection Context
Problem: Lost track of which machine I was running commands on
Solution: Every command block will specify:
# On your laptop:
$ talosctl version
# On TuringPi BMC (via SSH):
# tpi uart -n 1 get
# Context matters!
2. Save All Config Files
Problem: Generated configs existed only on one machine
Solution: Store all generated files in a dedicated directory:
~/talos-cluster/
├── talosconfig # Talos cluster config
├── kubeconfig # Kubernetes config
├── controlplane.yaml # Control plane config
├── worker.yaml # Worker config
└── cp.patch.yaml # Configuration patches
Back these up immediately after generation.
3. Keep a Command History
Problem: Couldn't remember the sequence of steps
Solution: Maintain a commands.md file with timestamped history of what was run and when.
4. Document the "Why"
Problem: Configuration options were unclear later
Solution: Comment every significant configuration choice:
- Why use eMMC for OS?
- Why enable iSCSI extensions?
- Why this specific VIP address?
5. Test Access Immediately
Problem: Assumed access would "just work" later
Solution: After each phase, verify:
- Can I still connect?
- Are my credentials working?
- Can I get back in from a cold start?
Configuration Decisions for My Cluster
Here are the key configuration choices I'll make (and why):
Storage Layout
OS Storage: 32GB eMMC per node
- Talos installation lives here
- Keeps OS separate from application data
- Easy to reflash if needed
Application Storage: 250GB NVMe per node
- Longhorn will pool this across nodes
- Persistent volumes for applications
- Fast local storage
Bulk Storage: 2x 900GB SATA (Node 3 only)
- Future use for backups or NFS
- Not part of initial installation
Network Configuration
VIP (Virtual IP): 10.0.0.65
- Single IP for Kubernetes API access
- Shared across control plane nodes
- kubectl connects here
Node IPs: DHCP initially
- Nodes will get IPs automatically
- Document what they get
- Consider static IPs later if needed
Cluster Topology
Control Plane: Nodes 1, 2, 3 (all RK1 modules with 8GB)
Worker: Node 4 (also control plane capable, but designated as worker initially)
Why this layout:
- 3-node control plane provides HA (High Availability)
- Small cluster means we'll enable scheduling on control plane
- Node 4 as "worker" gives us testing flexibility
System Extensions
iSCSI tools and Linux-tools: Required for Longhorn storage
NFS daemon: Planned for future NFS exports
These are included in the Image Factory build.
What's Different This Time
Comparing my November attempt to this new installation:
| Aspect | First Attempt (Nov 2025) | This Time (Feb 2026) |
|---|---|---|
| Documentation | Minimal notes | Complete workflow |
| Connection tracking | Lost context | Every command labeled |
| File management | Scattered | Organized directory |
| Access verification | Assumed it worked | Test at each phase |
| Troubleshooting | Wing it | Document problems |
| Reproducibility | Can't recreate | Fully documented |
The goal isn't just to get it working - it's to create documentation that lets me (or anyone else) rebuild this cluster from scratch at any time.
Why Document a Reinstallation?
You might ask: why write a blog post about planning to reinstall, rather than just doing it and documenting the actual installation?
Because the planning matters.
Understanding what went wrong the first time, why documentation is critical, and what the strategy is for the second attempt - that's valuable content. It shows the real process of learning infrastructure.
The next post will be the actual installation with complete step-by-step documentation. But this post sets the context: why we're doing it this way, what we learned from the first attempt, and how the documentation will be structured.
Coming Next
Next post: The actual Talos installation with complete documentation:
- Every command with context
- All configuration files explained
- Connection workflow documented
- Troubleshooting steps if things go wrong
- Verification at each phase
- Storage setup with Longhorn
- Final cluster ready for workloads
After that: MetalLB configuration for load balancing.
Lessons from 25+ Years in Infrastructure
A few principles that apply here:
1. Documentation is Part of the Work
Not an afterthought. Not "I'll document it later." The work isn't done until it's documented.
2. Future You is a Different Person
What's obvious now won't be obvious in three months. Write for someone who knows nothing about what you did.
3. Reproducibility is Reliability
If you can't reproduce it, you can't maintain it. Documentation makes reproduction possible.
4. Test Your Documentation
The only way to know if documentation works is to follow it yourself. That's what this reinstallation will prove.
Summary
I got Talos working on my TuringPi cluster in November 2025. Then I lost access because I didn't document my workflow properly.
Rather than cobbling together access from incomplete notes, I'm starting fresh with complete documentation. Same tutorial, same process, but this time every step will be documented with context, file locations, and connection details.
The next post will be the actual installation. This post is the setup: why we're doing it, what we learned from the first attempt, and how the documentation will be structured differently.
If you're planning a similar cluster: Learn from my mistake. Document as you go. Keep track of where commands run. Save all config files. Test your access workflow.
It's not enough to make it work once. You need to be able to make it work again.
← Previous: TuringPi Hardware
→ Next: Building the Bletchley Cluster
Questions or suggestions? Leave a comment below or reach out at igor@vluwte.nl.