Proxmox vs. raw bhyve on FreeBSD feels like an odd comparison. Proxmox feels like dollar store ESXi (though the container stuff that feels like VMs is a nice trick), but FreeBSD+bhyve feels more analogous to whatever distro running QEMU/KVM, because you lack the administration tools Promox has.
I presume the performance of the VMs is largely independent of the administrative tools running. I couldn’t find a huge amount about Promox, but it looked as if they built their own virtual machine infrastructure on KVM, rather than just using QEMU, but I could be wrong.
From their other posts, it looks as if a lot of their FreeBSD migration has been moving away from higher-level abstractions (e.g. moving from containers to directly administering jails, which is not something I’d willingly do), so this may be part of the same thing.
Proxmox is using QEMU as HV for virtual machines and lxc for containers.They just dont use libvirt as management interface and have created their own management tools around QEMU. They however upstream QEMU patches quite frequently and have some own patchsets regarding their own backup API they use to create live backups to the proxmox backup server. (amongst other things, so they ship a slightly patched version of qemu)
I don’t know much snot FreeBSD but find it amazing that it apparently compete well with Linux that gets much more attention.
Of course, if done of the better performance numbers were due to ignored fsyncs as suggested, that would not be acceptable for many workloads. And for those for which it is acceptable, it would be nice to see the fsyncs switched off on Linux as well.
Similarly, that ZFS potentially outperforms ext4 more or less across the board is amazing. That you can get all the great features without a performance malus is amazing.
Similarly, that ZFS potentially outperforms ext4 more or less across the board is amazing. That you can get all the great features without a performance malus is amazing.
I’m quite surprised by that. UFS on FreeBSD typically outperforms ZFS on most workloads, but I go with ZFS because the features are worth it for the slowdown. My understanding was that EXT2 was fairly similar to the original FFS (UFS1) with a few tweaks and that UFS2 and EXT4 eventually converged on quite similar designs, albeit via different routes, so I’d have expected EXT4 to outperform ZFS.
That said, this is on NVMe and ZFS’s design tends to work fairly nicely with fast flash. Data is written in the ZIL and then on disk in contiguous runs. Reads may be fairly random, but flash tends to perform very well for sequential writes and have much less of a penalty for random reads than spinning rust, and the larger block size for ZFS makes those random reads nowhere near as bad as for fragmented UFS.
Another issue is that zfs has its own block cache that doesn’t properly integrate with the kernel’s main page cache, so it wastes memory on duplicated pages and its io path has more copying.
That hasn’t been true on FreeBSD for about ten years. The buffer cache has a notion of non-owned pages, so the same page can be in the buffer cache and ARC.
That’s not really true on Linux either. Some small number of pages may be duplicated, but for most pages it doesn’t happen. And ZFS’s ARC is so much better than whatever the Linux devs can come up with, that it’s worth it.
ZFS +1. I run TrueNAS (FreeBSD) as a VM on my ProxMox server exactly for that reason! Love it!
Downside is that I don’t get SMART from my machine now but that’s because I’m using my RAID card as a controller only and not as a RAID. Too cheap right now to by an HBA card. To be clear, that’s not FreeBSD’s fault but a hardware configuration that I’m not (yet) changing.
I found it strange/perhaps disconcerting that the tests were labeled things like VM on FreeBSD (ZFS, NVMe) compared to VM on Proxmox (ZFS)…were these using the same hardware? I thought at the beginning of the article it said they were, so perhaps this is just “using a dedicated NVMe driver” in FreeBSD, but that option does not exist in the other? It’s hard to say exactly.
Also it kind of ends on a sour uninvestigated note about there potentially being a bug in the FreeBSD NVMe driver which lies about fsync operations and thus could artificially be inflating the benchmark numbers, but this is kind of just hand-waved over. I’m sorry, what? That seems pretty key to what we’re measuring here.
The NVMe there is exposing the VM’s filesystem as an emulated NVMe disk, rather than a VirtIO disk. From the Klara benchmarks, the emulated NVMe device from byhve is much faster than VirtIO storage. I don’t know if Promox has this option, but they didn’t test it if it does.
Also it kind of ends on a sour uninvestigated note about there potentially being a bug in the FreeBSD NVMe driver which lies about fsync operations
Yes, that was a bit weird. They didn’t say that there was a bug, only that there might be. If I remember NVMe correctly (reasonable chance that I don’t): It’s perfectly acceptable to coalesce sync operations, as long as you don’t report them completing earlier. The NVMe interface allows more reordering than VirtIO (which is most of why it’s faster), so it’s also possible that they’re seeing multiple sync commands in the queue and coalescing them to a single sync to the disk and reporting them as completed simultaneously. I think NVMe has different kinds of operations for ‘everything in the queue must be on disk by the time that this reports completed’ and ‘the previous flush must have completed before you do the next write’. Coalescing the former is fine.
Proxmox vs. raw bhyve on FreeBSD feels like an odd comparison. Proxmox feels like dollar store ESXi (though the container stuff that feels like VMs is a nice trick), but FreeBSD+bhyve feels more analogous to whatever distro running QEMU/KVM, because you lack the administration tools Promox has.
I presume the performance of the VMs is largely independent of the administrative tools running. I couldn’t find a huge amount about Promox, but it looked as if they built their own virtual machine infrastructure on KVM, rather than just using QEMU, but I could be wrong.
From their other posts, it looks as if a lot of their FreeBSD migration has been moving away from higher-level abstractions (e.g. moving from containers to directly administering jails, which is not something I’d willingly do), so this may be part of the same thing.
Proxmox is using QEMU as HV for virtual machines and lxc for containers.They just dont use libvirt as management interface and have created their own management tools around QEMU. They however upstream QEMU patches quite frequently and have some own patchsets regarding their own backup API they use to create live backups to the proxmox backup server. (amongst other things, so they ship a slightly patched version of qemu)
There is also ClonOS which aims to be Proxmox for FreeBSD, though it’s relatively young.
https://clonos.convectix.com/
And while not FreeBSD something coming with a lot of the same technologies (ZFS, byhve, dtrace, etc.).
https://omnios.org/
I don’t know much snot FreeBSD but find it amazing that it apparently compete well with Linux that gets much more attention.
Of course, if done of the better performance numbers were due to ignored fsyncs as suggested, that would not be acceptable for many workloads. And for those for which it is acceptable, it would be nice to see the fsyncs switched off on Linux as well.
Similarly, that ZFS potentially outperforms ext4 more or less across the board is amazing. That you can get all the great features without a performance malus is amazing.
I’m quite surprised by that. UFS on FreeBSD typically outperforms ZFS on most workloads, but I go with ZFS because the features are worth it for the slowdown. My understanding was that EXT2 was fairly similar to the original FFS (UFS1) with a few tweaks and that UFS2 and EXT4 eventually converged on quite similar designs, albeit via different routes, so I’d have expected EXT4 to outperform ZFS.
That said, this is on NVMe and ZFS’s design tends to work fairly nicely with fast flash. Data is written in the ZIL and then on disk in contiguous runs. Reads may be fairly random, but flash tends to perform very well for sequential writes and have much less of a penalty for random reads than spinning rust, and the larger block size for ZFS makes those random reads nowhere near as bad as for fragmented UFS.
Another issue is that zfs has its own block cache that doesn’t properly integrate with the kernel’s main page cache, so it wastes memory on duplicated pages and its io path has more copying.
That hasn’t been true on FreeBSD for about ten years. The buffer cache has a notion of non-owned pages, so the same page can be in the buffer cache and ARC.
That’s not really true on Linux either. Some small number of pages may be duplicated, but for most pages it doesn’t happen. And ZFS’s ARC is so much better than whatever the Linux devs can come up with, that it’s worth it.
ZFS +1. I run TrueNAS (FreeBSD) as a VM on my ProxMox server exactly for that reason! Love it!
Downside is that I don’t get SMART from my machine now but that’s because I’m using my RAID card as a controller only and not as a RAID. Too cheap right now to by an HBA card. To be clear, that’s not FreeBSD’s fault but a hardware configuration that I’m not (yet) changing.
I found it strange/perhaps disconcerting that the tests were labeled things like
VM on FreeBSD (ZFS, NVMe)
compared toVM on Proxmox (ZFS)
…were these using the same hardware? I thought at the beginning of the article it said they were, so perhaps this is just “using a dedicated NVMe driver” in FreeBSD, but that option does not exist in the other? It’s hard to say exactly.Also it kind of ends on a sour uninvestigated note about there potentially being a bug in the FreeBSD NVMe driver which lies about
fsync
operations and thus could artificially be inflating the benchmark numbers, but this is kind of just hand-waved over. I’m sorry, what? That seems pretty key to what we’re measuring here.The NVMe there is exposing the VM’s filesystem as an emulated NVMe disk, rather than a VirtIO disk. From the Klara benchmarks, the emulated NVMe device from byhve is much faster than VirtIO storage. I don’t know if Promox has this option, but they didn’t test it if it does.
Yes, that was a bit weird. They didn’t say that there was a bug, only that there might be. If I remember NVMe correctly (reasonable chance that I don’t): It’s perfectly acceptable to coalesce sync operations, as long as you don’t report them completing earlier. The NVMe interface allows more reordering than VirtIO (which is most of why it’s faster), so it’s also possible that they’re seeing multiple sync commands in the queue and coalescing them to a single sync to the disk and reporting them as completed simultaneously. I think NVMe has different kinds of operations for ‘everything in the queue must be on disk by the time that this reports completed’ and ‘the previous flush must have completed before you do the next write’. Coalescing the former is fine.
The FreeBSD (ZFS/NVMe) is about virtual controller type inside VM.
For example the NVMe is specified that way:
You can also use ‘virtio’ and ‘ahci-hd’ instead.
[Comment removed by author]