Writing to the framebuffer is extremely slow on bare metal.
Setting the caching behavior of the framebuffer to write combining results in significantly higher throughput.
The framebuffer should either be setup to use write combining in the bootloader,
or it should be mentioned in the readme/migration guides.
(It took me quite some time to figure out that that was my bottle neck)