Longhorn – A Kubernetes-Native Filesystem

Longhorn – a Kubernetes-native filesystem The other day, I took a look at Longhorn again. I briefly looked at it earlier, as a way to get volumes that are writeable from more than one node. I tossed it away then, because I mistook it for nothing more than a glorified NFS server. I was quite wrong. At the time of writing, I already know I’m going to make fundamental changes, I will not run with a 2-node cluster permanently, because the latency I have isn’t necessarily a good idea. However, longhorn will play a vital play in my DR strategy. I wanted to play with longhorn in a multi-node cluster before going there, though, because when I create my proper DR strategy, my 2nd node will be occupied with being a cluster in its own. Features of Longhorn Longhorn in a way has many similarities with ZFS, but made for a distributed environment like Kubernetes. In a nutshell, Longhorn provision block devices out of a pool – or several, I have an SSD pool and a HDD pool. You’ll create storage classes using those pools, with the properties you like. A storageclass is sort of a template for a volume, that says what properties it should have when it’s created. You can still change it afterwards, though. Longhorn also comes with a decent web console, making it easy to get overview of – and manage – your Longhorn storage solution. It has built-in support for snapshot-based backups, most commonly to S3 (or compatible) buckets. First, you need to decide how/if you want to replicate volumes. For the sake of experimenting with Longhorn, I have been running more like a two-node system for a while, though the latency between my on-prem and cloud makes it not so ideal. I’ll probably end up doing something different, a separate DR cluster, but more of that in another blog post. I have chosen two replicas. Longhorn does master/slave replication only, but it is easy to promote a replica to master, and most often it happens automatically when a POD wants to mount it on one of the nodes. Installing longhorn I provision my longhorn through ArgoCD, of course, like everything else: helmCharts: - name: longhorn repo: https://charts.longhorn.io version: 1.9.1 # pin a version; "*" is not supported here releaseName: longhorn namespace: longhorn-system valuesInline: defaultSettings: guaranteedEngineManagerCPU: 250 guaranteedReplicaManagerCPU: 250 defaultReplicaCount: 2 defaultDataLocality: best-effort replicaAutoBalance: disabled concurrentReplicaRebuildPerNodeLimit: 1 concurrentBackupRestorePerNodeLimit: 1 concurrentAutomaticEngineUpgradePerNodeLimit: 1 replicaReplenishmentWaitInterval: 600 taintToleration: "dedicated=remote:NoSchedule" defaultBackupStore: backupTarget: "s3://longhorn-backups@minio/" backupTargetCredentialSecret: longhorn-backup-secret preUpgradeChecker: jobEnabled: false service: ui: type: ClusterIP The minio-bucket, credentials etc I am also configuring in the ArgoCD app, with a minio job to provision user, bucket and access policy, but that’s out of scope for this blog post. Once longhorn is installed, it will create objects of type node.longhorn.io for all nodes it can find. But you can check in those objects in ArgoCD, adding/overriding properties. Configuring Longhorn So, let’s start off by defining some properties on my two nodes. These are the settings I care about, so this is what I have checked into the repository and manage through ArgoCD: apiVersion: longhorn.io/v1beta2 kind: Node metadata: name: hassio namespace: longhorn-system spec: allowScheduling: true disks: hdd-disk: path: /var/lib/longhorn_hdd allowScheduling: true diskType: filesystem tags: ["hdd"] ssd-disk: path: /var/lib/longhorn_ssd allowScheduling: true diskType: filesystem tags: ["ssd"] name: hassio tags: - primary My node remote has an exactly similar specification, but with tag dr. As you can see, I have mounted two volumes (ZFS volumes, actually, in my case), on the correct place. Longhorn doesn’t know about that, it just knows that the ssd tag goes to longhorn_ssd and the hdd tag goes to longhorn_hdd. Then I’ll need to specify some storageclasses. For my database (or database-like, like valkey), I have this storageclass: apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: longhorn-db-local-ssd provisioner: driver.longhorn.io allowVolumeExpansion: true reclaimPolicy: Retain volumeBindingMode: WaitForFirstConsumer parameters: numberOfReplicas: "2" dataLocality: "best-effort" fsType: "xfs" diskSelector: "ssd" recurringJobSelector: | [ {"name":"db-snap-5m","isGroup":false}, {"name":"db-backup-15m","isGroup":false}, {"name":"db-backup-daily","isGroup":false} ] Two replicas is always wise, then you can afford to lose one, i.e. if the master node goes down, Longhorn will promote the remaining replica to master and the workload can failover to that node. Using longhorn The PVC is provisioned by the mariadb-operator resource wordpress-db, and ends up looking like this: apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io volume.kubernetes.io/selected-node: remote volume.kubernetes.io/storage-provisioner: driver.longhorn.io creationTimestamp: "2025-08-30T15:07:58Z" finalizers: - kubernetes.io/pvc-protection labels: app.kubernetes.io/instance: wordpress-db app.kubernetes.io/name: mariadb pvc.k8s.mariadb.com/role: storage recurring-job-group.longhorn.io/default: enabled name: storage-wordpress-db-0 namespace: wordpress resourceVersion: "43341887" uid: 5f30bb14-00a5-4e7a-89cf-75908bd946e6 spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: longhorn-db-local-ssd volumeMode: Filesystem volumeName: pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 status: accessModes: - ReadWriteOnce capacity: storage: 10Gi phase: Bound You’ll not see much Longhorn properties there, because a PVC doesn’t have those properties in its custom resource, so let’s look at the Longhorn volume: piVersion: longhorn.io/v1beta2 kind: Volume metadata: creationTimestamp: "2025-08-30T15:07:58Z" finalizers: - longhorn.io generation: 24 labels: backup-target: default longhornvolume: pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 recurring-job-group.longhorn.io/wordpress: enabled setting.longhorn.io/remove-snapshots-during-filesystem-trim: ignored setting.longhorn.io/replica-auto-balance: ignored setting.longhorn.io/snapshot-data-integrity: ignored name: pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 namespace: longhorn-system resourceVersion: "44623154" uid: b19b92aa-c498-4e19-bd7e-2c052d852386 spec: Standby: false accessMode: rwo backingImage: "" backupCompressionMethod: lz4 backupTargetName: default dataEngine: v1 dataLocality: best-effort dataSource: "" disableFrontend: false diskSelector: - ssd encrypted: false freezeFilesystemForSnapshot: ignored fromBackup: "" frontend: blockdev image: longhornio/longhorn-engine:v1.9.1 lastAttachedBy: "" migratable: false migrationNodeID: "" nodeID: hassio nodeSelector: [] numberOfReplicas: 2 offlineRebuilding: ignored replicaAutoBalance: least-effort replicaDiskSoftAntiAffinity: ignored replicaSoftAntiAffinity: ignored replicaZoneSoftAntiAffinity: ignored restoreVolumeRecurringJob: ignored revisionCounterDisabled: true size: "10737418240" snapshotDataIntegrity: ignored snapshotMaxCount: 250 snapshotMaxSize: "0" staleReplicaTimeout: 2880 unmapMarkSnapChainRemoved: ignored There’s tons of settings you can tune here, some of them like number of replicas on the fly, but you can’t i.e. move it to a new disk on the fly. You can see that the storage class has specified two replicas and SSD disk for it, which is what I intented. I have a quite similar volume for a wordpress-files PVC. Once these are provisioned, there’s not much difference in using them than using other PVCs. But since it’s longhorn, it doesn’t matter which node I start the POD on, it will find and mount the volume at the node the POD starts, as the master replica. Any changes done will be synchronized to the replica in a synchronous fashion, so there is a write penalty of having replicas. If you stop a POD, the volumes it holds will (usually, if you specify it to) go into a detached state, ready to be mounted at any node where it has a replica. In fact, let’s test this quickly while I write this. Let’s do a switchover of the blog to the remote node. What you can do to test this, is to cordon the node you’re moving from. That tells the kubernetes scheduler not to run any new PODs on the node. hassio% kubectl cordon hassio node/hassio cordoned hassio% kubectl get pvc -n wordpress NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE storage-wordpress-db-0 Bound pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 10Gi RWO longhorn-db-local-ssd 7d7h storage-wordpress-db-dr-0 Bound wordpress-db-dr 10Gi RWO longhorn-db-local-ssd 3d15h wordpress-files Bound pvc-7241107b-5109-4d29-a36f-663c56de8a98 100Gi RWO longhorn-rwo-local-hdd 7d10h wordpress-files-dr Bound wordpress-files-dr 100Gi RWO longhorn 3d16h hassio% kubectl get volume -n longhorn-system pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 NAME DATA ENGINE STATE ROBUSTNESS SCHEDULED SIZE NODE AGE pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 v1 attached healthy 10737418240 hassio 7d7h hassio% kubectl get volume -n longhorn-system pvc-7241107b-5109-4d29-a36f-663c56de8a98 NAME DATA ENGINE STATE ROBUSTNESS SCHEDULED SIZE NODE AGE pvc-7241107b-5109-4d29-a36f-663c56de8a98 v1 attached healthy 107374182400 hassio 7d10h hassio% kubectl get pod -n wordpress wordpress-db-0 -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES wordpress-db-0 1/1 Running 0 5d9h 10.151.254.66 hassio hassio% kubectl get pod -n wordpress wordpress-app-65c649644d-hhwzl -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES wordpress-app-65c649644d-hhwzl 1/1 Running 0 5d9h 10.151.254.127 hassio hassio% kubectl rollout restart -n wordpress statefulset wordpress-db statefulset.apps/wordpress-db restarted hassio% kubectl rollout restart -n wordpress deployment wordpress-app deployment.apps/wordpress-app restarted <....wait a minute or so...> hassio% kubectl get pod -n wordpress wordpress-db-0 -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES wordpress-db-0 1/1 Running 0 71s 10.151.24.28 remote hassio% kubectl get pod -n wordpress wordpress-app-79cf75c54f-27hnd -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES wordpress-app-79cf75c54f-27hnd 1/1 Running 0 2m 10.151.24.30 remote hassio% kubectl get volume -n longhorn-system pvc-7241107b-5109-4d29-a36f-663c56de8a98 NAME DATA ENGINE STATE ROBUSTNESS SCHEDULED SIZE NODE AGE pvc-7241107b-5109-4d29-a36f-663c56de8a98 v1 attached healthy 107374182400 remote 7d10h hassio% kubectl get volume -n longhorn-system pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 NAME DATA ENGINE STATE ROBUSTNESS SCHEDULED SIZE NODE AGE pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 v1 attached healthy 10737418240 remote 7d7h There – workloads switched over, and easily found the same volumes on the other side. Backup Backup is built into Longhorn. I showed how I specified the default backup destination. You need to define backup schedules, though. If you have a more complex need, you can also have volumes backuped up to different backup destination. Finally, if you have a DR environment somewhere, you can replicate the backup through other means (s3 replication, rclone, …), and the DR node will happily accept it as a backup target ready to be restored from. In fact, you can define volumes that will be bootstrapped with data from a backup once you start it, and you can define a volume as a standby volume so that it’s ready to be fired up in an instant. In fact, let’s sidestep into setting up a DR volume for my wordpress database. apiVersion: longhorn.io/v1beta2 kind: Volume metadata: name: wordpress-db-dr namespace: longhorn-system spec: Standby: true accessMode: rwo backingImage: "" backupCompressionMethod: lz4 backupTargetName: dr dataLocality: best-effort frontend: blockdev diskSelector: - ssd fromBackup: s3://longhorn-backups@minio/?backup=backup-ec6eb47329464749&volume=pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 nodeSelector: - dr numberOfReplicas: 1 size: "10737418240" There’s no support from doing this from creating a PVC and a PV, so in this case you need to create those manually: apiVersion: v1 kind: PersistentVolume metadata: name: wordpress-db-dr spec: capacity: storage: 10Gi accessModes: ["ReadWriteOnce"] storageClassName: longhorn persistentVolumeReclaimPolicy: Retain claimRef: # <- pre-bind to the PVC the operator will create namespace: wordpress name: storage-wordpress-db-dr-0 csi: driver: driver.longhorn.io volumeHandle: wordpress-db-dr # <- your Longhorn volume name fsType: xfs The PVC will in this case be created by mariadb-operator, but because of the claimRef in the PV, the PVC will find and pick up the PV once it’s created. Bu back to backups. As you noticed in the wordpress-db-dr, volume, I have specifed to pull the backup from a different backup destination than where it’s backed up. You can create additional: apiVersion: longhorn.io/v1beta2 kind: BackupTarget metadata: name: dr namespace: longhorn-system spec: backupTargetURL: s3://longhorn-backups@minio-dr/ credentialSecret: longhorn-backup-secret-dr pollInterval: "300s" The actual destination in the backup target is specified in the longhorn-backup-secret-dr hassio% kubectl get secrets -n longhorn-system longhorn-backup-secret-dr -o yaml apiVersion: v1 data: AWS_ACCESS_KEY_ID: AWS_ENDPOINTS: AWS_SECRET_ACCESS_KEY: kind: Secret metadata: annotations: longhorn.io/backup-target: s3://longhorn-backups@minio-dr reflector.v1.k8s.emberstack.com/reflection-allowed: "true" reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true" reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: "minio" creationTimestamp: "2025-09-02T16:25:45Z" labels: io.portainer.kubernetes.configuration.owner: "" io.portainer.kubernetes.configuration.owner.id: "" name: longhorn-backup-secret-dr namespace: longhorn-system ownerReferences: - apiVersion: bitnami.com/v1alpha1 controller: true kind: SealedSecret name: longhorn-backup-secret-dr uid: 9a646781-cbcf-4ade-946b-3642ae5f7c0e resourceVersion: "41763673" uid: d9c1fded-5c89-4a26-b1ec-f9d59b969f30 type: Opaque As you can see, I have checked it in as a sealed secret, and I am replicating it to minio with https://github.com/emberstack/kubernetes-reflector. This is because I also use these credentials to set up the user in minio. The DR backup target is periodically synced from my primary backup target, in a cron job. I’ll not going to into detail with this, but once I get around to setting up a separate DR cluster, this will be a key component, synchronizing backups continuously. apiVersion: batch/v1 kind: CronJob metadata: name: longhorn-backups-sync namespace: longhorn-system spec: schedule: "*/15 * * * *" concurrencyPolicy: Forbid jobTemplate: spec: template: spec: restartPolicy: OnFailure containers: - name: rclone image: rclone/rclone:latest args: - sync - primary:longhorn-backups - dr:longhorn-backups - --s3-chunk-size=64M - --s3-upload-concurrency=4 - --transfers=4 - --checkers=8 - --s3-acl=private - --no-update-modtime - --delete-after env: # primary remote - { name: RCLONE_CONFIG_PRIMARY_TYPE, value: s3 } - { name: RCLONE_CONFIG_PRIMARY_PROVIDER, value: Minio } - { name: RCLONE_CONFIG_PRIMARY_ACCESS_KEY_ID, valueFrom: { secretKeyRef: { name: longhorn-backup-secret, key: AWS_ACCESS_KEY_ID }}} - { name: RCLONE_CONFIG_PRIMARY_SECRET_ACCESS_KEY, valueFrom: { secretKeyRef: { name: longhorn-backup-secret, key: AWS_SECRET_ACCESS_KEY }}} - { name: RCLONE_CONFIG_PRIMARY_ENDPOINT, value: "http://minioprod.minio.svc.cluster.local:9000" } - { name: RCLONE_CONFIG_PRIMARY_ACL, value: private } # dr remote - { name: RCLONE_CONFIG_DR_TYPE, value: s3 } - { name: RCLONE_CONFIG_DR_PROVIDER, value: Minio } - { name: RCLONE_CONFIG_DR_ACCESS_KEY_ID, valueFrom: { secretKeyRef: { name: longhorn-backup-secret, key: AWS_ACCESS_KEY_ID }}} - { name: RCLONE_CONFIG_DR_SECRET_ACCESS_KEY, valueFrom: { secretKeyRef: { name: longhorn-backup-secret, key: AWS_SECRET_ACCESS_KEY }}} - { name: RCLONE_CONFIG_DR_ENDPOINT, value: "http://miniodr.minio.svc.cluster.local:9000" } - { name: RCLONE_CONFIG_DR_ACL, value: private } Backup schedules Backup schedules are defined with custom resources. This is the one I use for wordpress, but as you see, I am using the same for several other workloads. apiVersion: longhorn.io/v1beta2 kind: RecurringJob metadata: name: web-snap-15m namespace: longhorn-system spec: task: snapshot cron: "*/15 * * * *" retain: 96 concurrency: 1 groups: ["nextcloud","wordpress","paperless","gitea","bookstack"] --- apiVersion: longhorn.io/v1beta2 kind: RecurringJob metadata: name: web-backup-30m namespace: longhorn-system spec: task: backup cron: "*/30 * * * *" retain: 48 concurrency: 1 groups: ["nextcloud","wordpress","paperless","gitea","bookstack"] --- apiVersion: longhorn.io/v1beta2 kind: RecurringJob metadata: name: web-backup-daily namespace: longhorn-system spec: task: backup cron: "0 1 * * *" # 01:00 UTC (~03:00 Oslo most of the year) retain: 30 concurrency: 1 groups: ["nextcloud","wordpress","paperless","gitea","bookstack"] To assign a backup schedule to a volume, You’ll set a label: hassio% kubectl get volume -n longhorn-system pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 --show-labels NAME DATA ENGINE STATE ROBUSTNESS SCHEDULED SIZE NODE AGE LABELS pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6 v1 attached healthy 10737418240 hassio 7d1h backup-target=default,longhornvolume=pvc-5f30bb14-00a5-4e7a-89cf-75908bd946e6,recurring-job-group.longhorn.io/wordpress=enabled,setting.longhorn.io/remove-snapshots-during-filesystem-trim=ignored,setting.longhorn.io/replica-auto-balance=ignored,setting.longhorn.io/snapshot-data-integrity=ignored ..and that’s about it! Now, Longhorn will make sure snapshots and backups are taken according to the wordpress schedule. If I label all wordpress-related volumes with the same recurring-job-group, Longhorn will make sure snapshots and backups are taken roughly the same time. But how do we actually use the backups to restore ? Well, you basically need to stop the POD(s) using it, and then attach it in maintenance mode on a node. Then, you can do operations like rolling back to a certain snapshot or restore from a backup. The longhorn operations is easy to do through the GUI, although there exists some CLI tools. In the GUI, you can browse through all the volumes, perform maintenance operations (backups, snapshot, mounting, attaching, ….), making it easy to manage your storage. I find the web interface intuitive enough that it’s worth using. I’ll not describe or make any tutorial of what you can do there. In reality, you can do most things, but I prefer checking in permanent configuration to gitea and manage it with ArgoCD, as I do for the rest of my kubernetes cluster. Summary I quite got to like Longhorn, as much that I have converted all my volumes to longhorn. The migration strategy was pretty brute force, I created new volumes, mounted both the old a new in a POD, and rsync’ed the old content into the new one.I’ll demonstrate more of the features in a future blog post, when I build my separate DR cluster on remote

Longhorn – A Kubernetes-Native Filesystem

Share this article

Related Articles