homelab-journey

Secret Management Part 2: Securing the Remaining Cluster Secrets

Eight namespaces, eight ExternalSecrets, zero static tokens. Migrating every cluster credential to OpenBao — and learning which secrets don't need a vault at all.

Introduction

Part 1 of this series installed OpenBao and the External Secrets Operator, migrated the TransIP API key, and proved the pattern: store credentials in OpenBao, sync them into Kubernetes Secrets via ESO, verify the workload still functions. One secret, one namespace, one proof of concept.

This post applies that pattern to the rest of the cluster. What looked like a straightforward "do it again for each namespace" quickly turned into something more interesting. The first decision — how ESO should authenticate with OpenBao — led to trying the obvious approach, hitting a real limitation, and choosing something better. Several secrets that looked like credentials turned out not to be. And a few migrations had their own specific challenges that are worth documenting.

One of the triggers for this work was a discovery earlier in the cluster's life: Talos machine configs were accidentally committed to git. The files were cleaned up, but the incident made it clear that a systematic approach to secret management was overdue — not just migrating one API key as in Part 1, but auditing and securing everything.

By the end, every credential on the cluster is managed by ESO, nothing sensitive is gitignored, and the Forgejo repository is fully committed with no credential gaps.

🏠 This is part of the Homelab Journey series - building a production Kubernetes cluster from scratch.

Secret Management Part 1: OpenBao and the First Secret
Secret Management Part 2: Securing the Remaining Cluster Secrets (you are here)

This post assumes OpenBao and ESO are installed and the TransIP secret is already migrated. If you're starting from scratch, check out Secret Management Part 1 first.

The Secret Audit

Before migrating anything, the first step is to understand what's actually there. One command gives the full picture, excluding service account tokens and Helm release secrets which are managed by Kubernetes and never need migrating:

kubectl get secrets -A | grep -v 'kubernetes.io/service-account-token\|helm.sh'

The output was longer than expected. The original plan going in had four targets: Authelia session secrets, the Authelia user database, Garage S3 credentials, and Alertmanager SMTP. The audit found fourteen secrets in total. More importantly, it changed how several of them were classified — some things that looked like credentials turned out to be deployment artefacts, and a few that weren't in the plan at all turned out to need migrating.

Working through each one, everything falls into one of four categories:

Credentials that need migrating — externally set values that live in OpenBao and sync via ESO. Garage S3 credentials, Longhorn backup credentials, Authelia secrets, Alertmanager SMTP, and Forgejo backup S3 all fall here.

Infrastructure secrets — TLS certificates issued by cert-manager, webhook CAs, ACME account keys, MetalLB and ESO infrastructure. These are managed by the tools that create them. No action needed.

Chart-managed config — secrets assembled by Helm charts from values at install time. The four Forgejo secrets (forgejo, forgejo-init, forgejo-inline-config) contain shell scripts and app.ini sections generated from Helm values. The talos-backup-secrets secret contains a single config key — a Talos ServiceAccount token generated by Talos itself. None of these are external credentials. Migrating them to OpenBao would accomplish nothing and break the chart's ability to manage them.

Stale install-time credentials — this category deserves a longer explanation.

Not Every Secret Needs a Vault

grafana in the monitoring namespace has three keys: admin-user, admin-password, and ldap-toml. On first glance this looks like something to migrate. The Grafana admin password should be in OpenBao, right?

Not necessarily. Looking at the values file tells the real story:

adminPassword: "changeme"   # real value set via Grafana UI; this is a deployment placeholder

The changeme value was intentional from day one. Grafana reads this secret once, during initial pod startup. The password was changed via the Grafana UI immediately after install, and Grafana has used its own internal database for authentication ever since. The Kubernetes Secret is a deployment artefact that served its purpose and is now inert.

forgejo-admin is the same pattern, made even more explicit by the chart:

username: a-igor
password: "changeme"     # change immediately post-install via UI; inert after first boot
email: "a-igor@vluwte.nl"
passwordMode: initialOnlyNoReset     # set once on first pod start; never auto-reset

passwordMode: initialOnlyNoReset is the Forgejo chart's way of saying: this credential is used exactly once, during the init container on first boot, and never again. The real password lives in Forgejo's database. The Kubernetes Secret is noise.

Both grafana and forgejo-admin were in the original migration plan as credentials to migrate. Inspection moved them out of scope. The original plan was wrong — not because it was carelessly written, but because you don't fully know what you have until you look. Plans about secrets are often incomplete in exactly this way.

The lesson: before migrating a secret, ask whether it's actually a live credential. A changeme placeholder with a comment explaining its lifecycle is documentation, not a security gap. Putting it in OpenBao would add operational overhead for no security benefit.

Authentication: Staying Consistent with Kubernetes Auth

Part 1 already established the authentication pattern: ESO uses the Kubernetes auth method to connect to OpenBao. The SecretStore in cert-manager uses auth.kubernetes with a dedicated role. No static tokens anywhere in the chain.

The initial plan for Part 2 called for creating per-namespace static tokens — a token stored in a Kubernetes Secret, referenced by each SecretStore. This is a documented pattern and the plan was written before stepping back to think it through. Following it, the first token was created:

bao token create -policy=garage-eso -orphan -display-name=eso-garage

It worked. Then two things became obvious in quick succession.

First: this had already been done differently in Part 1. The cert-manager SecretStore uses Kubernetes auth. Why would the new namespaces use a different approach?

Second: the output showed token_duration: 768h. OpenBao's default max_lease_ttl caps tokens at 32 days. Six namespaces with 32-day tokens means six tokens to rotate, six Kubernetes Secrets to update, six 1Password entries to maintain — on a recurring schedule, indefinitely. As the cluster grows, that number grows with it.

The right answer was already in use. Kubernetes auth is the most common pattern for in-cluster workloads: ESO presents its ServiceAccount JWT to OpenBao, OpenBao verifies it against the Kubernetes API, and issues a short-lived token scoped to the relevant policy for that one fetch. Nothing static anywhere in the chain. Adding a new namespace requires only a role binding — no token, no 1Password entry, no rotation schedule.

The test token was revoked and the plan updated. Kubernetes auth consistently across all namespaces.

Setting up the new namespaces requires root — creating policies and roles is an administrative operation that the kv-admin token can't perform (confirmed with a 403 on bao policy read kv-admin). The root token ceremony is covered in detail in Part 1 — the short version is: bao operator generate-root -init, provide two of three Shamir key shares from 1Password, decode the result with the OTP. The ceremony was run once to create everything needed, then root was revoked.

Kubernetes auth was already enabled from Part 1 (confirmed via bao auth list), so the setup was creating policies and roles:

# One policy per namespace — read-only access to that namespace's KV path
for policy in garage longhorn authelia monitoring forgejo talos-backup
do
  bao policy write ${policy}-eso - <<EOF
path "secret/data/${policy}/*" {
  capabilities = ["read"]
}
EOF
done

# One role per namespace — binds ESO's ServiceAccount to the policy
# ESO runs as 'external-secrets' in the 'external-secrets' namespace for all roles
for policy in garage longhorn authelia monitoring forgejo talos-backup
do
  bao write auth/kubernetes/role/${policy}-eso \
    bound_service_account_names=external-secrets \
    bound_service_account_namespaces=external-secrets \
    policies=${policy}-eso \
    ttl=1h
done

The ESO ServiceAccount is external-secrets in the external-secrets namespace for all roles — confirmed by checking the running pod spec. Every role uses the same ServiceAccount; only the role name and policy path differ.

The SecretStore Pattern

Every namespace that has secrets to manage needs its own SecretStore. This is the same least-privilege design from Part 1 — a namespace-scoped SecretStore means ESO in garage can only authenticate as garage-eso, with read access to secret/data/garage/* and nothing else.

The SecretStore YAML is nearly identical for every namespace:

apiVersion: external-secrets.io/v1
kind: SecretStore
metadata:
  name: openbao
  namespace: garage
spec:
  provider:
    vault:
      server: "http://openbao.openbao.svc:8200"
      path: "secret"
      version: "v2"
      auth:
        kubernetes:
          mountPath: "kubernetes"
          role: "garage-eso"

Change the namespace and role for each namespace. That's it. The SecretStore files live alongside the workload they serve — apps/storage-ops/secretstore-garage.yaml, apps/authelia/secretstore-authelia.yaml, and so on.

Migrating the Credentials

All bao kv put operations run from inside the OpenBao pod using the kv-admin token. This keeps the token off the local machine — no shell history, no risk of it ending up in a file. The setup is the same for every migration step:

# Enter the OpenBao pod
kubectl exec -it -n openbao openbao-0 -- /bin/sh

# Set the kv-admin token
export VAULT_TOKEN=<kv-admin-token>

# Confirm you're on the right token
bao token lookup
# display_name: token-kv-admin
# policies: [default kv-admin]

The kubectl commands for reading existing secrets and applying SecretStores and ExternalSecrets run from outside the pod as normal.

Garage S3 Credentials

The garage-secrets secret in the garage namespace contains three keys. Reading them before writing anything to OpenBao is the right habit:

kubectl get secret garage-secrets -n garage -o jsonpath='{.data}' | jq 'keys'
# ["admin_token", "garage.toml", "rpc_secret"]

Two of these — admin_token and rpc_secret — are also embedded inside garage.toml, which is the entire Garage configuration file stored as a single value. This duplication is worth noting: if rpc_secret is ever rotated, it needs to be updated in both the standalone key and inside the garage.toml content, or they drift out of sync.

The garage.toml value is multi-line, so it needs the file input syntax:

cat > /tmp/garage.toml <<EOF
metadata_dir = "/mnt/meta"
data_dir = "/mnt/data"
...
rpc_secret = "<rpc_secret>"
...
admin_token = "<admin_token>"
EOF

bao kv put secret/garage/credentials \
  admin_token=<admin_token> \
  rpc_secret=<rpc_secret> \
  garage.toml=@/tmp/garage.toml

# clean up temp file
rm /tmp/garage.toml

The ExternalSecret maps each OpenBao key back to its Kubernetes Secret key, with target.name: garage-secrets to preserve the existing secret name so Garage needs no reconfiguration.

Verification — confirm the secret synced, keys are correct, and Garage is still running:

# ExternalSecret should show SecretSynced
kubectl get externalsecret garage-secrets -n garage

# Keys should match the original three
kubectl get secret garage-secrets -n garage -o jsonpath='{.data}' | jq 'keys'
# ["admin_token", "garage.toml", "rpc_secret"]

# Pod should be running with no restarts
kubectl get pods -n garage

Longhorn Backup Credentials

longhorn-backup-secret in longhorn-system was not in the original inventory — found in the audit. It's the secret Longhorn uses to connect to Garage as a backup target, with four keys: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_ENDPOINTS, and AWS_CERT. The last two are not sensitive — AWS_ENDPOINTS is the internal cluster address and AWS_CERT is empty (no TLS cert needed for the internal endpoint). Both are included in the migration to keep the secret shape identical.

Same pattern as Garage: read keys, store in OpenBao, SecretStore, ExternalSecret. Verification:

# ExternalSecret should show SecretSynced
kubectl get externalsecret longhorn-backup-secret -n longhorn-system

# Keys should match the original four
kubectl get secret longhorn-backup-secret -n longhorn-system -o jsonpath='{.data}' | jq 'keys'
# ["AWS_ACCESS_KEY_ID", "AWS_CERT", "AWS_ENDPOINTS", "AWS_SECRET_ACCESS_KEY"]

# All Longhorn pods should be running with no unexpected restarts
kubectl get pods -n longhorn-system

Authelia User Database

The authelia-users secret contains a single key: users_database.yml, whose value is the entire YAML file including the Argon2id password hash. The file-based approach from Part 1 (storing multi-line content with @filename) applies here too.

There's one gotcha with Argon2id hashes: they contain $ characters, which the shell treats as variable references inside a heredoc. This silently truncates the hash:

# Wrong — shell expands $argon2id and everything after it disappears
cat > /tmp/users_database.yml <<EOF
users:
  igor:
    password: "$argon2id$v=19$m=65536..."
EOF

The fix is to single-quote the heredoc delimiter, which tells the shell to treat the contents literally:

# Correct — single-quoted EOF prevents variable expansion
cat > /tmp/users_database.yml <<'EOF'
users:
  igor:
    displayname: Igor
    password: "$argon2id$v=19$m=65536,t=3,p=4$<salt>$<hash>"
    email: igor@vluwte.nl
    groups:
      - admins
EOF

Always verify the file content before storing it — confirm the password hash starts with $argon2id and is complete:

cat /tmp/users_database.yml

Then store in OpenBao and clean up:

bao kv put secret/authelia/users-database \
  users_database.yml=@/tmp/users_database.yml

# clean up temp file
rm /tmp/users_database.yml

Verify before moving on — given the $ truncation risk, check the secret content directly:

# ExternalSecret should show SecretSynced
kubectl get externalsecret authelia-users -n authelia

# Decode the stored value and confirm the hash is complete
kubectl get secret authelia-users -n authelia \
  -o jsonpath='{.data.users_database\.yml}' | base64 -d
# First line should be "users:" and the password hash should start with $argon2id

The force-sync annotation is useful when you need ESO to pick up a corrected value immediately rather than waiting for the next refresh cycle:

kubectl annotate externalsecret authelia-users -n authelia \
  force-sync=$(date +%s) --overwrite

Authelia Chart-Generated Secrets

The authelia secret contains three chart-generated keys:

identity_validation.reset_password.jwt.hmac.key
session.encryption.key
storage.encryption.key

These are randomly generated by the chart on first install. The migration has to preserve the exact values — if they change, Authelia loses its session and storage encryption, invalidating all active sessions and making the database unreadable. The key names must also be reproduced exactly, including the dots.

Dots in key names are handled correctly by both OpenBao and ESO — quoting them in the bao kv put command is sufficient:

bao kv put secret/authelia/chart-secrets \
  "identity_validation.reset_password.jwt.hmac.key"=<decoded-value> \
  "session.encryption.key"=<decoded-value> \
  "storage.encryption.key"=<decoded-value>

After storing the values and creating the ExternalSecret, the Authelia Helm values need one addition to tell the chart to stop generating its own secret and use the one ESO provides:

secret:
  existingSecret: authelia

This is a root-level key. The Helm upgrade triggers a rolling restart, which immediately revealed a secondary issue: Authelia uses a PVC for storage, and Longhorn's default single-node attachment constraint means the new pod can't attach the volume until the old pod releases it. Deleting the terminating pod manually releases the PVC and the new pod comes up cleanly.

Verification — with Authelia this is especially important given the key name sensitivity and the downtime involved:

# ExternalSecret should show SecretSynced
kubectl get externalsecret authelia-chart-secrets -n authelia

# Confirm all three dot-separated key names are present
kubectl get secret authelia -n authelia -o jsonpath='{.data}' | jq 'keys'
# ["identity_validation.reset_password.jwt.hmac.key", "session.encryption.key", "storage.encryption.key"]

# Pod should be running with no crash loops
kubectl get pods -n authelia

# End-to-end: log in to any Authelia-protected service

Alertmanager SMTP Credentials

Unlike the other migrations, this one wasn't moving an existing credential — it was creating a new one. Alertmanager was previously configured to use an open relay on port 25. Adding authentication required a new mail account (bletchley@vluwte.nl) on the Postfix/Dovecot server, using SHA512-CRYPT for the password hash.

One bletchley@vluwte.nl account handles all cluster outbound mail. Per-application accounts would give more granular audit trails in mail logs, but for a single-person homelab that granularity isn't worth the overhead. The From: header identifies the sending application; the auth account is always bletchley@vluwte.nl.

The prometheus-community standalone chart (not kube-prometheus-stack) uses extraSecretMounts to mount secrets into the Alertmanager pod — not alertmanagerSpec.secrets, which is a kube-prometheus-stack pattern:

alertmanager:
  extraSecretMounts:
    - name: alertmanager-smtp
      mountPath: /etc/alertmanager/secrets/alertmanager-smtp
      secretName: alertmanager-smtp
      readOnly: true
  config:
    global:
      smtp_smarthost: 'mail.luwte.net:587'
      smtp_from: 'alerts@vluwte.nl'
      smtp_require_tls: true
      smtp_auth_username: bletchley@vluwte.nl
      smtp_auth_password_file: /etc/alertmanager/secrets/alertmanager-smtp/password

Two things to watch: config: must be a sibling of alertmanagerSpec:, not nested inside it — moving it inside wipes the custom config entirely and reverts to defaults. And the SMTP hostname must match the server's TLS certificate CN — smtp.luwte.net failed with a certificate mismatch; mail.luwte.net worked.

Verification is a test alert delivered end-to-end:

kubectl port-forward -n monitoring svc/prometheus-alertmanager 9093:9093
curl -X POST http://localhost:9093/api/v2/alerts \
  -H 'Content-Type: application/json' \
  -d '[{"labels":{"alertname":"TestAlert","severity":"info"}}]'

Forgejo Backup S3 and talos-backup S3

Both follow the same straightforward KV pattern. The Forgejo backup job reads access-key-id, secret-access-key, and endpoint from its secret; the talos-backup job reads AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Both were verified by triggering the respective jobs manually after migration:

kubectl create job --from=cronjob/forgejo-backup -n forgejo forgejo-backup-manual-test
kubectl create job --from=cronjob/talos-backup -n talos-backup talos-backup-manual-test

Both completed successfully within seconds.

The Migration Pattern in Practice

Every migration followed the same sequence, which is worth making explicit for reference:

Read the existing secret's key names: kubectl get secret <name> -n <ns> -o jsonpath='{.data}' | jq 'keys'
Decode the values: kubectl get secret <name> -n <ns> -o go-template='{{range $k,$v := .data}}{{$k}}={{$v | base64decode}}{{"\n"}}{{end}}'
Store in OpenBao using the exact key names
Apply the SecretStore for the namespace
Start a watch: kubectl get externalsecrets -n <ns> -w
Delete the existing Kubernetes Secret
Apply the ExternalSecret
Watch for SecretSynced
Verify keys match: kubectl get secret <name> -n <ns> -o jsonpath='{.data}' | jq 'keys'
Verify the workload is still healthy

The delete-before-apply order matters. ESO with creationPolicy: Owner cannot adopt a secret it didn't create — the old secret must be deleted first so ESO can create a fresh one with ownership. The window between deletion and ESO recreating it is typically under a second.

Final State

kubectl get externalsecrets -A

NAMESPACE         NAME                     STORETYPE     STORE     REFRESH INTERVAL   STATUS         READY
authelia          authelia-chart-secrets   SecretStore   openbao   1h                 SecretSynced   True
authelia          authelia-users           SecretStore   openbao   1h                 SecretSynced   True
cert-manager      transip-secret           SecretStore   openbao   1h                 SecretSynced   True
forgejo           forgejo-backup-s3        SecretStore   openbao   1h                 SecretSynced   True
garage            garage-secrets           SecretStore   openbao   1h                 SecretSynced   True
longhorn-system   longhorn-backup-secret   SecretStore   openbao   1h                 SecretSynced   True
monitoring        alertmanager-smtp        SecretStore   openbao   1h                 SecretSynced   True
talos-backup      talos-backup-s3          SecretStore   openbao   1h                 SecretSynced   True

Eight ExternalSecrets, all SecretSynced. No plain Kubernetes Secrets for credentials. The git status output showed only the new SecretStore and ExternalSecret files, updated values files, and README additions — no gitignored credential files remaining.

The cleanup also included moving the SecretStore from Part 1 (secretstore-openbao.yaml in apps/openbao/) to infra/certificates/secretstore-cert-manager.yaml, where it belongs alongside the other cert-manager configuration. And README files were added to every directory to document what lives there, what's managed by ESO, and why certain secrets are deliberately excluded.

What's Working Now

✅ All 8 ExternalSecrets synced across 7 namespaces
✅ Kubernetes auth method — no static tokens, no expiry, no rotation required
✅ Authelia login working post-migration with chart secrets and user database from OpenBao
✅ Alertmanager email delivery confirmed via test alert
✅ Forgejo and Talos backup jobs verified via manual runs
✅ No gitignored credential files — repository fully committed
⚠️ Known gap: OpenBao requires manual unseal after every restart — covered in Part 3

What's Next

The remaining piece of the secret management arc is auto-unseal. Right now, every OpenBao restart requires manually providing two of three Shamir key shares before any secret-dependent workload can start. That's a significant operational burden — not just inconvenient, but a real availability problem if the cluster restarts unexpectedly.

Part 3 addresses this by deploying a second OpenBao instance on Proxmox as a Transit KMS. Bletchley's OpenBao calls it at startup to decrypt its master key automatically. The result: planned restarts become fully automatic, and only a simultaneous restart of both systems requires manual intervention.

← Previous: Reflection: 25 Posts

Questions or suggestions? Leave a comment below or reach out at igor@vluwte.nl.