1. 12
  1.  

    1. 4

      Huh… before he mentioned that the directory for tomorrow’s date was created early, I was waiting for the bug to be that a sync-spam run started at 11:59pm would check the current date at 12:01am.

    2. 4

      It seems too me that, given the possibility of concurrent writes, marking a directory as “fully processed” is the source of errors.

    3. 4

      I hadn’t heard of Steinmetz before, but his Wikipedia page is a fascinating read (keeping in mind he was born in 1865!), and the following anecdote I particularly adore:

      When Joseph LeRoy Hayden, a loyal and hardworking lab assistant, announced that he would marry and look for his own living quarters, Steinmetz made the unusual proposal of opening his large home, complete with research lab, greenhouse, and office to the Haydens and their prospective family. Hayden favored the idea, but his future wife was wary of the unorthodox arrangement. She agreed after Steinmetz’s assurance that she could run the house as she saw fit.

      After an uneasy start, the arrangement worked well for all parties, especially after three Hayden children were born. Steinmetz legally adopted Joseph Hayden as his son, becoming grandfather to the youngsters, entertaining them with fantastic stories and spectacular scientific demonstrations. The unusual, harmonious living arrangement lasted for the rest of Steinmetz’s life.

    4. 4

      I suspect his corrected version still has a problem.

      His first issue was that the next day’s directory might be created early… giving a window on the order of ~5 minutes in which it could erroneously be processed and marked complete. The fix was to mark only previous day directories as completed.

      I would like to point out that there is also a race condition around the precise moment when the date changes. We know it takes on the order of ~1 minute to read a directory, so depending on how the code is written there might be a ~1 minute window or perhaps only a ~few milliseconds window in which this application and the other applications that are writing records might disagree about what the date is. Either way, it leaves a possibility for imperfect data.

      Given that skipping historical directories is an optimization (and the system worked without that optimization in place), I would probably switch to marking directories as “completely scanned” only if they are older than yesterday (instead of older than today). That means we will continue re-scanning yesterday’s directory for a full day, but it also gives us a full 24 hours to be confident that all systems agree the date has changed. (If the optimization were more important, I might give it 30 mins instead of 24 hours – but certainly something much larger than the expected run time of any processes that probably cache the date.)

    5. 3

      Marginally (ir)relevant aside: if you are wondering about the square quotes, they are mjd’s way of writing scare quotes after I made a jokey suggestion in response to one of mjd’s shitposts

      1. 3

        I definitely interpreted them as “the ceiling of simple”, which works here too I think!

      2. 2

        You’re practically famous

        1. 1

          I’m less famous than my email address 🤓

          1. 1

            Excellent use of Austria.

      3. 2

        You are making the world a better place.

      4. 1

        My internet-fried brain interpreted this as “simple” being the bugfixes’ stand name. But that’s just me.

    6. 2

      In my experience, bugs are usually easy to fix, it’s just finding the root cause that is complex.

      1. 2

        Oh, the same bug as OpenSSH 😭

    7. 2

      I was initially expecting a local-versus-UTC issue needing to mark the the directory read multiple hours late.

      Honestly, if I find something creating the future directory five minutes early, I would assume something else adds some message to yesterday five minutes late. (So there is a need for some grace period after the end of the date)