Longhorn – a Kubernetes-native filesystem
The other day, I took a look at Longhorn again. I briefly looked at it earlier, as a way to get volumes that are writeable from more than one node. I tossed it away then, because I mistook it for nothing more than a glorified NFS server. I was quite wrong.
At the time of writing, I already know I’m going to make fundamental changes, I will not run with a 2-node cluster permanently, because the latency I have isn’t necessarily a good idea. However, longhorn will play a vital play in my DR strategy.
I wanted to play with longhorn in a multi-node cluster before going there, though, because when I create my proper DR strategy, my 2nd node will be occupied with being a cluster in its own.
Features of Longhorn
Longhorn in a way has many similarities with ZFS, but made for a distributed environment like Kubernetes. In a nutshell, Longhorn provision block devices out of a pool – or several, I have an SSD pool and a HDD pool. You’ll create storage classes using those pools, with the properties you like. A storageclass is sort of a template for a volume, that says what properties it should have when it’s created. You can still change it afterwards, though. Longhorn also comes with a decent web console, making it easy to get overview of – and manage – your Longhorn storage solution. It has built-in support for snapshot-based backups, most commonly to S3 (or compatible) buckets.
First, you need to decide how/if you want to replicate volumes. For the sake of experimenting with Longhorn, I have been running more like a two-node system for a while, though the latency between my on-prem and cloud makes it not so ideal. I’ll probably end up doing something different, a separate DR cluster, but more of that in another blog post.
I have chosen two replicas. Longhorn does master/slave replication only, but it is easy to promote a replica to master, and most often it happens automatically when a POD wants to mount it on one of the nodes.
Installing longhorn
I provision my longhorn through ArgoCD, of course, like everything else:
helmCharts:
- name: longhorn
repo: https://charts.longhorn.io
version: 1.9.1 # pin a version; "*" is not supported here
releaseName: longhorn
namespace: longhorn-system
valuesInline:
defaultSettings:
guaranteedEngineManagerCPU: 250
guaranteedReplicaManagerCPU: 250
defaultReplicaCount: 2
defaultDataLocality: best-effort
replicaAutoBalance: disabled
concurrentReplicaRebuildPerNodeLimit: 1
concurrentBackupRestorePerNodeLimit: 1
concurrentAutomaticEngineUpgradePerNodeLimit: 1
replicaReplenishmentWaitInterval: 600
taintToleration: "dedicated=remote:NoSchedule"
defaultBackupStore:
backupTarget: "s3://longhorn-backups@minio/"
backupTargetCredentialSecret: longhorn-backup-secret
preUpgradeChecker:
jobEnabled: false
service:
ui:
type: ClusterIP
The minio-bucket, credentials etc I am also configuring in the ArgoCD app, with a minio job to provision user, bucket and access policy, but that’s out of scope for this blog post.
Once longhorn is installed, it will create objects of type node.longhorn.io for all nodes it can find. But you can check in those objects in ArgoCD, adding/overriding properties.
Configuring Longhorn
So, let’s start off by defining some properties on my two nodes. These are the settings I care about, so this is what I have checked into the repository and manage through ArgoCD:
apiVersion: longhorn.io/v1beta2
kind: Node
metadata:
name: hassio
namespace: longhorn-system
spec:
allowScheduling: true
disks:
hdd-disk:
path: /var/lib/longhorn_hdd
allowScheduling: true
diskType: filesystem
tags: ["hdd"]
ssd-disk:
path: /var/lib/longhorn_ssd
allowScheduling: true
diskType: filesystem
tags: ["ssd"]
name: hassio
tags:
- primary
My node remote has an exactly similar specification, but with tag dr. As you can see, I have mounted two volumes (ZFS volumes, actually, in my case), on the correct place. Longhorn doesn’t know about that, it just knows that the ssd tag goes to longhorn_ssd and the hdd tag goes to longhorn_hdd.
Then I’ll need to specify some storageclasses. For my database (or database-like, like valkey), I have this storageclass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-db-local-ssd
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
parameters:
numberOfReplicas: "2"
dataLocality: "best-effort"
fsType: "xfs"
diskSelector: "ssd"
recurringJobSelector: |
[
{"name":"db-snap-5m","isGroup":false},
{"name":"db-backup-15m","isGroup":false},
{"name":"db-backup-daily","isGroup":false}
]
Two replicas is always wise, then you can afford to lose one, i.e. if the master node goes down, Longhorn will promote the remaining replica to master and the workload can failover to that node.
Using longhorn
The PVC is provisioned by the mariadb-operator resource wordpress-db, and ends up looking like this:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
volume.kubernetes.io/selected-node: remote
volume.kubernetes.io/storage-provisioner: driver.longhorn.io
creationTimestamp: "2025-08-30T15:07:58Z"
finalizers:
- kubernetes.io/pvc-protection
labels:
app.kubernetes.io/instance: wordpress-db
app.kubernetes.io/name: mariadb
pvc.k8s.mariadb.com/role: storage
recurring-job-group.longhorn.io/default: enabled
name: storage-wordpress-db-0
namespace: wordpress
resourceVersion: "43341887"
uid: 5f30bb14-00a5-4e7a-89cf-75908bd946e6
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: longhorn-db-local-ssd
volumeMode: Filesystem
volumeName: pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
phase: Bound
You’ll not see much Longhorn properties there, because a PVC doesn’t have those properties in its custom resource, so let’s look at the Longhorn volume:
piVersion: longhorn.io/v1beta2
kind: Volume
metadata:
creationTimestamp: "2025-08-30T15:07:58Z"
finalizers:
- longhorn.io
generation: 24
labels:
backup-target: default
longhornvolume: pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6
recurring-job-group.longhorn.io/wordpress: enabled
setting.longhorn.io/remove-snapshots-during-filesystem-trim: ignored
setting.longhorn.io/replica-auto-balance: ignored
setting.longhorn.io/snapshot-data-integrity: ignored
name: pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6
namespace: longhorn-system
resourceVersion: "44623154"
uid: b19b92aa-c498-4e19-bd7e-2c052d852386
spec:
Standby: false
accessMode: rwo
backingImage: ""
backupCompressionMethod: lz4
backupTargetName: default
dataEngine: v1
dataLocality: best-effort
dataSource: ""
disableFrontend: false
diskSelector:
- ssd
encrypted: false
freezeFilesystemForSnapshot: ignored
fromBackup: ""
frontend: blockdev
image: longhornio/longhorn-engine:v1.9.1
lastAttachedBy: ""
migratable: false
migrationNodeID: ""
nodeID: hassio
nodeSelector: []
numberOfReplicas: 2
offlineRebuilding: ignored
replicaAutoBalance: least-effort
replicaDiskSoftAntiAffinity: ignored
replicaSoftAntiAffinity: ignored
replicaZoneSoftAntiAffinity: ignored
restoreVolumeRecurringJob: ignored
revisionCounterDisabled: true
size: "10737418240"
snapshotDataIntegrity: ignored
snapshotMaxCount: 250
snapshotMaxSize: "0"
staleReplicaTimeout: 2880
unmapMarkSnapChainRemoved: ignored
There’s tons of settings you can tune here, some of them like number of replicas on the fly, but you can’t i.e. move it to a new disk on the fly.
You can see that the storage class has specified two replicas and SSD disk for it, which is what I intented.
I have a quite similar volume for a wordpress-files PVC. Once these are provisioned, there’s not much difference in using them than using other PVCs. But since it’s longhorn, it doesn’t matter which node I start the POD on, it will find and mount the volume at the node the POD starts, as the master replica. Any changes done will be synchronized to the replica in a synchronous fashion, so there is a write penalty of having replicas.
If you stop a POD, the volumes it holds will (usually, if you specify it to) go into a detached state, ready to be mounted at any node where it has a replica.
In fact, let’s test this quickly while I write this. Let’s do a switchover of the blog to the remote node.
What you can do to test this, is to cordon the node you’re moving from. That tells the kubernetes scheduler not to run any new PODs on the node.
hassio% kubectl cordon hassio
node/hassio cordoned
hassio% kubectl get pvc -n wordpress
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
storage-wordpress-db-0 Bound pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 10Gi RWO longhorn-db-local-ssd 7d7h
storage-wordpress-db-dr-0 Bound wordpress-db-dr 10Gi RWO longhorn-db-local-ssd 3d15h
wordpress-files Bound pvc-7241107b-5109-4d29-a36f-663c56de8a98 100Gi RWO longhorn-rwo-local-hdd 7d10h
wordpress-files-dr Bound wordpress-files-dr 100Gi RWO longhorn 3d16h
hassio% kubectl get volume -n longhorn-system pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6
NAME DATA ENGINE STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 v1 attached healthy 10737418240 hassio 7d7h
hassio% kubectl get volume -n longhorn-system pvc-7241107b-5109-4d29-a36f-663c56de8a98
NAME DATA ENGINE STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
pvc-7241107b-5109-4d29-a36f-663c56de8a98 v1 attached healthy 107374182400 hassio 7d10h
hassio% kubectl get pod -n wordpress wordpress-db-0 -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
wordpress-db-0 1/1 Running 0 5d9h 10.151.254.66 hassio
hassio% kubectl get pod -n wordpress wordpress-app-65c649644d-hhwzl -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
wordpress-app-65c649644d-hhwzl 1/1 Running 0 5d9h 10.151.254.127 hassio
hassio% kubectl rollout restart -n wordpress statefulset wordpress-db
statefulset.apps/wordpress-db restarted
hassio% kubectl rollout restart -n wordpress deployment wordpress-app
deployment.apps/wordpress-app restarted
<....wait a minute or so...>
hassio% kubectl get pod -n wordpress wordpress-db-0 -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
wordpress-db-0 1/1 Running 0 71s 10.151.24.28 remote
hassio% kubectl get pod -n wordpress wordpress-app-79cf75c54f-27hnd -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
wordpress-app-79cf75c54f-27hnd 1/1 Running 0 2m 10.151.24.30 remote
hassio% kubectl get volume -n longhorn-system pvc-7241107b-5109-4d29-a36f-663c56de8a98
NAME DATA ENGINE STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
pvc-7241107b-5109-4d29-a36f-663c56de8a98 v1 attached healthy 107374182400 remote 7d10h
hassio% kubectl get volume -n longhorn-system pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6
NAME DATA ENGINE STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 v1 attached healthy 10737418240 remote 7d7h
There – workloads switched over, and easily found the same volumes on the other side.
Backup
Backup is built into Longhorn. I showed how I specified the default backup destination.
You need to define backup schedules, though. If you have a more complex need, you can also have volumes backuped up to different backup destination.
Finally, if you have a DR environment somewhere, you can replicate the backup through other means (s3 replication, rclone, …), and the DR node will happily accept it as a backup target ready to be restored from. In fact, you can define volumes that will be bootstrapped with data from a backup once you start it, and you can define a volume as a standby volume so that it’s ready to be fired up in an instant. In fact, let’s sidestep into setting up a DR volume for my wordpress database.
apiVersion: longhorn.io/v1beta2
kind: Volume
metadata:
name: wordpress-db-dr
namespace: longhorn-system
spec:
Standby: true
accessMode: rwo
backingImage: ""
backupCompressionMethod: lz4
backupTargetName: dr
dataLocality: best-effort
frontend: blockdev
diskSelector:
- ssd
fromBackup: s3://longhorn-backups@minio/?backup=backup-ec6eb47329464749&volume=pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6
nodeSelector:
- dr
numberOfReplicas: 1
size: "10737418240"
There’s no support from doing this from creating a PVC and a PV, so in this case you need to create those manually:
apiVersion: v1 kind: PersistentVolume metadata: name: wordpress-db-dr spec: capacity: storage: 10Gi accessModes: ["ReadWriteOnce"] storageClassName: longhorn persistentVolumeReclaimPolicy: Retain claimRef: # <- pre-bind to the PVC the operator will create namespace: wordpress name: storage-wordpress-db-dr-0 csi: driver: driver.longhorn.io volumeHandle: wordpress-db-dr # <- your Longhorn volume name fsType: xfs
The PVC will in this case be created by mariadb-operator, but because of the claimRef in the PV, the PVC will find and pick up the PV once it’s created.
Bu back to backups. As you noticed in the wordpress-db-dr, volume, I have specifed to pull the backup from a different backup destination than where it’s backed up. You can create additional:
apiVersion: longhorn.io/v1beta2
kind: BackupTarget
metadata:
name: dr
namespace: longhorn-system
spec:
backupTargetURL: s3://longhorn-backups@minio-dr/ credentialSecret: longhorn-backup-secret-dr pollInterval: "300s"
The actual destination in the backup target is specified in the longhorn-backup-secret-dr
hassio% kubectl get secrets -n longhorn-system longhorn-backup-secret-dr -o yaml
apiVersion: v1
data:
AWS_ACCESS_KEY_ID:
AWS_ENDPOINTS:
AWS_SECRET_ACCESS_KEY:
kind: Secret
metadata:
annotations:
longhorn.io/backup-target: s3://longhorn-backups@minio-dr
reflector.v1.k8s.emberstack.com/reflection-allowed: "true"
reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true"
reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: "minio"
creationTimestamp: "2025-09-02T16:25:45Z"
labels:
io.portainer.kubernetes.configuration.owner: ""
io.portainer.kubernetes.configuration.owner.id: ""
name: longhorn-backup-secret-dr
namespace: longhorn-system
ownerReferences:
- apiVersion: bitnami.com/v1alpha1
controller: true
kind: SealedSecret
name: longhorn-backup-secret-dr
uid: 9a646781-cbcf-4ade-946b-3642ae5f7c0e
resourceVersion: "41763673"
uid: d9c1fded-5c89-4a26-b1ec-f9d59b969f30
type: Opaque
As you can see, I have checked it in as a sealed secret, and I am replicating it to minio with https://github.com/emberstack/kubernetes-reflector. This is because I also use these credentials to set up the user in minio.
The DR backup target is periodically synced from my primary backup target, in a cron job. I’ll not going to into detail with this, but once I get around to setting up a separate DR cluster, this will be a key component, synchronizing backups continuously.
apiVersion: batch/v1
kind: CronJob
metadata:
name: longhorn-backups-sync
namespace: longhorn-system
spec:
schedule: "*/15 * * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: rclone
image: rclone/rclone:latest
args:
- sync
- primary:longhorn-backups
- dr:longhorn-backups
- --s3-chunk-size=64M
- --s3-upload-concurrency=4
- --transfers=4
- --checkers=8
- --s3-acl=private
- --no-update-modtime
- --delete-after
env:
# primary remote
- { name: RCLONE_CONFIG_PRIMARY_TYPE, value: s3 }
- { name: RCLONE_CONFIG_PRIMARY_PROVIDER, value: Minio }
- { name: RCLONE_CONFIG_PRIMARY_ACCESS_KEY_ID, valueFrom: { secretKeyRef: { name: longhorn-backup-secret, key: AWS_ACCESS_KEY_ID }}}
- { name: RCLONE_CONFIG_PRIMARY_SECRET_ACCESS_KEY, valueFrom: { secretKeyRef: { name: longhorn-backup-secret, key: AWS_SECRET_ACCESS_KEY }}}
- { name: RCLONE_CONFIG_PRIMARY_ENDPOINT, value: "http://minioprod.minio.svc.cluster.local:9000" }
- { name: RCLONE_CONFIG_PRIMARY_ACL, value: private }
# dr remote
- { name: RCLONE_CONFIG_DR_TYPE, value: s3 }
- { name: RCLONE_CONFIG_DR_PROVIDER, value: Minio }
- { name: RCLONE_CONFIG_DR_ACCESS_KEY_ID, valueFrom: { secretKeyRef: { name: longhorn-backup-secret, key: AWS_ACCESS_KEY_ID }}}
- { name: RCLONE_CONFIG_DR_SECRET_ACCESS_KEY, valueFrom: { secretKeyRef: { name: longhorn-backup-secret, key: AWS_SECRET_ACCESS_KEY }}}
- { name: RCLONE_CONFIG_DR_ENDPOINT, value: "http://miniodr.minio.svc.cluster.local:9000" }
- { name: RCLONE_CONFIG_DR_ACL, value: private }
Backup schedules
Backup schedules are defined with custom resources. This is the one I use for wordpress, but as you see, I am using the same for several other workloads.
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: web-snap-15m
namespace: longhorn-system
spec:
task: snapshot
cron: "*/15 * * * *"
retain: 96
concurrency: 1
groups: ["nextcloud","wordpress","paperless","gitea","bookstack"] ---
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: web-backup-30m
namespace: longhorn-system
spec:
task: backup
cron: "*/30 * * * *"
retain: 48
concurrency: 1
groups: ["nextcloud","wordpress","paperless","gitea","bookstack"]
---
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: web-backup-daily
namespace: longhorn-system
spec:
task: backup
cron: "0 1 * * *" # 01:00 UTC (~03:00 Oslo most of the year)
retain: 30
concurrency: 1
groups: ["nextcloud","wordpress","paperless","gitea","bookstack"]
To assign a backup schedule to a volume, You’ll set a label:
hassio% kubectl get volume -n longhorn-system pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 --show-labels
NAME DATA ENGINE STATE ROBUSTNESS SCHEDULED SIZE NODE AGE LABELS
pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 v1 attached healthy 10737418240 hassio 7d1h backup-target=default,longhornvolume=pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6,recurring-job-group.longhorn.io/wordpress=enabled,setting.longhorn.io/remove-snapshots-during-filesystem-trim=ignored,setting.longhorn.io/replica-auto-balance=ignored,setting.longhorn.io/snapshot-data-integrity=ignored
..and that’s about it! Now, Longhorn will make sure snapshots and backups are taken according to the wordpress schedule. If I label all wordpress-related volumes with the same recurring-job-group, Longhorn will make sure snapshots and backups are taken roughly the same time.
But how do we actually use the backups to restore ? Well, you basically need to stop the POD(s) using it, and then attach it in maintenance mode on a node. Then, you can do operations like rolling back to a certain snapshot or restore from a backup.
The longhorn operations is easy to do through the GUI, although there exists some CLI tools. In the GUI, you can browse through all the volumes, perform maintenance operations (backups, snapshot, mounting, attaching, ….), making it easy to manage your storage.
I find the web interface intuitive enough that it’s worth using. I’ll not describe or make any tutorial of what you can do there. In reality, you can do most things, but I prefer checking in permanent configuration to gitea and manage it with ArgoCD, as I do for the rest of my kubernetes cluster.
Summary
I quite got to like Longhorn, as much that I have converted all my volumes to longhorn. The migration strategy was pretty brute force, I created new volumes, mounted both the old a new in a POD, and rsync’ed the old content into the new one.I’ll demonstrate more of the features in a future blog post, when I build my separate DR cluster on remote