1. 29
  1.  

    1. 13

      The diagram at the top is great. It’s really disappointing that this information isn’t stated so explicitly in the official documentation.

      1. 1

        Holy crap you weren’t kidding about that diagram.

    2. 2

      fsync on MacOS doesn’t flush to stable storage !?

      This comment from 2 years ago: https://news.ycombinator.com/item?id=30371403 seems to suggest that: a) BSD does the same b) drives may not actually do that when asked

      I’ve definitely heard in the past that drives “have enough time / capacitors / rotational energy to get flush the write to stable storage” but only anecdotally?

      Anyone got more anecdata?

      1. 3

        It turns out BSD’s UFS does indeed do the same where fsync() is implemented relying on soft updates to provide a consistent ordering, but never issues a BIO_FLUSH to make the written data fully durable. I’ve updated the post to include this.

        The notes of drives not respecting a FLUSH command or FUA flag is a persistent myth with some truth. Likely due to NDA reasons, it’s hard to get someone to admit an exact drive model and manufacturer, but social wisdom seems to be pretty consistent that there exist consumer drives which lie about data durability to look better on benchmarks. This isn’t even considering firmware bugs in drives or anything “accidental”.

        “Consumer drives” is key there, as enterprise grade drives is where the manufacturer will include supercapacitors to hold enough charge to allow the drive to complete any promised writes before its volatile cache is lost, so that they can intentionally preemptively respond with success to FLUSH or FUA. It’s still fair to question though if that backup power is enough to handle the worst case full volatile cache + significant FTL garbage collection work + degraded capacitors + etc. situation, but it’s at least probably true, versus a complete lie for cheap consumer drives.

    3. 1

      I’m not sure the diagram at the top is entirely correct, specifically with regard to the difference between O_DIRECT and O_DIRECT|O_SYNC. AFAIK there’s no difference in whether the data written to the storage device is forcefully flushed from its on-board volatile cache into non-volatile storage (that will happen in either case, I think…at least on Linux) – what the addition of O_SYNC provides is a guarantee that metadata updates required to access the written data (e.g. file size if the write is beyond the previous EOF) will also be synchronously performed – which the body of the article sort of gets into later on, so the diagram showing what it does seems a bit mysterious.

      1. 1

        I see the interpretation you’re using of the man page:

               O_DIRECT (since Linux 2.4.10)
                      Try to minimize cache effects of the I/O to and from this
                      file.  In general this will degrade performance, but it is
                      useful in special situations, such as when applications do
                      their own caching.  File I/O is done directly to/from
                      user-space buffers.  The O_DIRECT flag on its own makes an
                      effort to transfer data synchronously, but does not give
                      the guarantees of the O_SYNC flag that data and necessary
                      metadata are transferred.  To guarantee synchronous I/O,
                      O_SYNC must be used in addition to O_DIRECT.  See NOTES
                      below for further discussion.
        

        However, the FUA flag is controlled only by O_DSYNC (and O_SYNC is __O_SYNC|O_DSYNC. O_DIRECT is only about skipping the page cache. It is worth mentioning that appending to a file with O_DIRECT is to be avoided, which I’ll go edit the post to add.

    4. 1

      Lovely readable site layout as well. That’s what I call well-presented information.