Storing times for human events

42

Storing times for human events programming simonwillison.net
via carlana 3 months ago | caches
Archive.org Archive.today Ghostarchive
| 42 comments

42

1. 7
  
  koala 3 months ago
  
  For recurring events, you need to store the “intended” timezone anyway.
  
  (In my previous job, we all learned that even among places that all observe DST, DST does not apply the same day for everyone. We had recurring meetings that were all scheduled by different people in their own timezone, so around DST change times there was some shuffling.)
  1. 3
    
    stig 3 months ago
    
    We refer to this as the Bi-Annual Festival of Calendar Fire.
  2. 2
    
    yawaramin 3 months ago
    
    What about a recurring event that periodically changes locations to different timezomes?
    1. 4
      
      koala 3 months ago
      
      Deliver electric shock to the organizer?
      
      Honestly, I think that’s a much more niche case, so that’s where I would say the complexity required is likely not worthwhile, schedule that manually :)
2. 7
  
  hwayne 3 months ago
  
  I’ve gotten in large arguments with people before about this: lots of engineers are convinced that everything MUST be in UTC no matter what. The best argument I’ve seen is that instead of storing user timezone we should store UTC and user GPS, and then compute the timezone from where they made the event.
  1. 10
    
    Johz 2 months ago
    
    I think one of the issues is that a lot of people treat all time and date related issues the same. So when you get a rule (even a very useful rule!) like “store timestamps as UTC”, it’s very easy to apply that rule to things that aren’t really timestamps at all.
    
    In the case of a planned event, we’re not storing a timestamp, we’re storing the configuration a user has given to generate a timestamp. It’s the same way that if a user creates a chart in Microsoft Excel, Excel doesn’t internally store the rendered chart, it stores the configuration as defined by the user, so that the user can later come and change that configuration as needed.
    
    With an event, the user is probably thinking in terms of a wall clock time, and a location where that wall clock is hanging. In practice, that second parameter is usually implicit, so figuring out how to get the user to provide it is difficult, but using GPS could be useful (bearing in mind that a user’s current location may not be the location of the event), or maybe providing a list of timezones (not offsets) where the user can search for their own city and find the correct timezone.
  2. 6
    
    sknebel 3 months ago
    
    People traveling love events being in whatever timezone their train/plane/ship happened to be in!
    1. 1
      
      landon 3 months ago
      
      Until we try to plan an event at home while travelling
  3. 5
    
    coby 2 months ago
    
    Yeahhh. Working in a timeseries domain where every DST transition means more bugs coming out of the woodwork has convinced me that the everything-must-be-UTC thing is a bit of a dogma.
  4. 3
    matklad edited 3 months ago
    
    Two timestamps! Consider the case where the law is passed that changes DST, between the point when the event was scheduled and when it actually occurs. To figure out whether the law actually shifts the time, you need to know whether it came in force in between the two events, so you need to know two times. So looks like the info you actually need is:
    
    UTC for when the event was scheduled
    
    the offset, in seconds, from the moment of scheduling to the actual event, given the timezone information known at the moment of scheduling
    
    GPS coordinates of where the event was scheduled
    
    (out of band) timezone information, including all the preceding historical changes to it!
    
    Coincidentally, I covered the three “flavors” of time (stop watch, absolute, calendar) in the recent IronBeetle episode: https://www.youtube.com/watch?v=3vz3NeO-GkY&list=PL9eL-xg48OM3pnVqFSRyBFleHtBBw-nmZ&index=51. Luckily, we don’t have to deal with calendar time in TigerBeetle!
    1. 3
      
      Johz 2 months ago
      
      Surely you can avoid UTC altogether by storing the location of the event and the wall clock time of that location? That, I would have thought, is how most people plan an event conceptually.
      
      It gets a bit complicated if you have people in other timezones, and then the timezone rules update, because you might then need to send out new notifications (“the event is in one hour” becomes “the event was an hour ago and you missed it, sorry”). You might also need to store separate start and end locations for events that move between different timezones (e.g. flights). But this avoids storing things in UTC and then doing complicated calculations on those values.
      1. 1
        
        tomjakubowski edited 2 months ago
        
        Yeah, that seems pretty solid. Store the place and expected local time of the event, and then determine a timezone definition (which can change over time) from those things when you need to reference the event’s time against some other point in time.
        
        Although, given DST, you would still need a way to disambiguate times during the DST transition: when the clock rolls back an hour, local time goes through 2:30am (or whatever) twice in the same day. I don’t know if there is any preëxisting notation to disambiguate these duplicate local times.
        
        I suppose this is also a problem for planning late-night events even without computer involvement. More reasons to abolish DST!
        
        1
        
        fanf 2 months ago
        
        The way I like to disambiguate time when the clocks go back is using an earlier/later flag. As well as DST transitions it can cope with things like time when travelling — it’s more general than an is_dst flag. Dunno what the notation should be, tho! Maybe 2024-10-27T01:23:45<[Europe/London] vs 2024-10-27T01:23:45>[Europe/London] for earlier vs later.
      2. 1
        
        fanf 2 months ago
        
        If you only store local time and location(s) then the difficulty is finding out if the timezone rules have changed and which times are affected. I suppose in principle you could calculate a diff of the timezones to find the time spans that are affected by the change, and then search for events planned in those time spans. The neat thing about storing a precalculated UTC is that you can spot when an event is wrong independent of any timezone update process. Finding wrong events might be a bit slow, though, if there are lots of them.
  5. 2
    
    masklinn 2 months ago
    
    Do you mean stupidest?
    
    That scheme means instead of storing local date time and a symbolic timezone (which is a location already) you need to store a creation time, a timestamp, and a location, so that you can convert the location to a time zone, the timestamp to what the local time was at creation time (which time libraries may or may not support at all) before actually computing the current offset / non-local conversions.
    1. 1
      
      lonjil 2 months ago
      
      symbolic timezone (which is a location already)
      
      Only a small number of locations have symbolic timezones though.
  6. 1
    
    riking 3 months ago
    
    “Everything must be UTC” was a movement to counter problems caused by storing non-UTC Unix timestamps, and you will at minimum get consistent software behavior. Sometimes standard is better than good.
    1. 2
      
      fanf edited 2 months ago
      
      More precisely it was due to problems with things like log files that wrote timestamps in local time without a zone offset.
      
      Unix time (time_t) is by definition UTC.
  7. 1
    
    peterbourgon 3 months ago
    
    A timestamp associated with an event (e.g. created_at), and how that timestamp should be displayed to a given user, are two separate things. Ideally you store the event timestamps in the events table (or whatever) in UTC, and you store user timezone information in the users table (or whatever) as makes sense, and when you render an event timestamp you pass the UTC event timestamp value thru the user timezone to get a localized output value.
    
    Mixing timestamps with different timezones, locales, offsets, etc. in a single e.g. column is a direct path to sadness.
    1. 2
      
      hwayne 2 months ago
      
      The OP is all about why storing everything in UTC is a bad idea!
      1. 5
        
        simonw 2 months ago
        
        To clarify, I think writing created_at and similar when-something-happened timestamps as UTC is a great idea.
        
        The only times I don’t think “just use UTC” is good advice are times of events that are occurring in the future and for which the “local time” is the way people will be thinking about them - an evening event at 6pm for example.
        
        Those are the ones were weird edge cases may make you regret converting them to UTC and storing only that.
      2. 1
        
        peterbourgon edited 2 months ago
        
        The OP says that “the most important thing to record is the original user’s intent” – I agree! And I don’t think this is incompatible with storing event timestamps in UTC.
        
        If a user in UTC-5 creates a future event with a timestamp of next week Monday at 6PM user-local time, that create operation might arrive with a user-local timestamp of 2024-12-02T18:00:00-5. That user timestamp can always be transformed to a UTC timestamp of (in this case) 2024-12-02T23:00:00Z. And then if the user loads that created event, the UTC timestamp can be reverse-transformed to their local timezone without loss of information.
        
        If the user-submitted timestamp is ambiguous, like, I dunno, 2024-12-02 6PM, where the timezone is implicit, then this is easy – you just apply the user’s currently-configured local TZ when transforming the input to a UTC output.
        
        Essentially, written timestamps are resolved and fixed at time of input, and transformed to whatever the user’s TZ expectations might be at time of output.
        
        (I think this satisfies all of the concerns in the OP, at least!)
        
        4
        
        tonyfinn 2 months ago
        
        While I agree the transformation from 2024-12-02T18:00:00-05:00 to 2024-12-02T23:00:00Z is lossless, 2024-12-02T18:00:00-05:00 is not actually the user intent. The user intent is e.g. 2024-12-02T18:00:00 in their local time. Usually we approximate that as 2024-12-02T18:00:00[America/New_York] or 2024-12-02T18:00:00[America/Grand_Turk] or whatever depending on their political jurisdiction. This happens to be 2024-12-02T18:00:00-05:00 right now, but that’s not a lossless transformation - it depends on a lot of external state (politics and the legal system). And when we’re dealing with “human events”, users expect the calendar and wall clock time to be consistent, not the UTC offset.
        
        Now with a 3 day window, the odds of this changing surprisingly are limited. That kind of rapid timezone change usually only happens in cases of political instability, in which case users are at least not going to be surprised that computers can’t keep up with the whims of local governing bodies. (This is also why part of why the IANA TZ to city abstraction works. If NJ secedes from the US and declares they have a new time zone, users are also probably going to less surprised they need to pick that timezone)
        
        But the longer out your users schedule this, the more and more countries you’re going to run into who have timezone rule changes. Globally this happens 10s of times a year. And then there’s cases like when the EU announced they were abolishing DST in March 2021 and then just… didn’t (because of covid officially, but also I suspect it would have gotten postponed anyway).
        
        1
        
        peterbourgon edited 2 months ago
        
        All fair points.
        
        I definitely concede that, in the case where timestamps are provided directly by users, and those timestamps are expected to represent specific and user-oriented points in time, then transforming them to UTC and storing them in that form is gonna have a lot of problems, mostly related to user expectations, as you’ve described.
        
        However, if you store timestamps with timezones, and especially if those timezones are IANA timezone strings, then the fundamental semantics of the timestamp changes. With UTC timestamps, each value represents a well-defined and specific point in time, which occurs precisely once. You can build systems on top of that invariant, like job J1 should be triggered at timestamp T1. But timezoned timestamps don’t provide those guarantees; 2025-03-09T02:30:00[America/New_York] is a timestamp that will never actually occur, and 2025-11-03T01:30:00[America/New_York] will occur twice.
        
        I guess this just means UTC timestamps and timezoned timestamps serve two different purposes.
        
        2
        
        fanf 2 months ago
        
        Your “always” is not always when the timezone rules change. There is loss of information about which local time and timezone the user intended.
        
        Storing UTC means you can’t automatically keep (say) the 11am meeting at 11am when the timezone rules change, because you lost the information you needed. This bug caused enormous unnecessary work for Microsoft Exchange users in 2007 when the North American DST schedule changed, because Exchange had converted the meeting times to UTC and thrown away the users intended plans.
        
        3
        
        simonw edited 2 months ago
        
        Thanks for the tip, that sent me down a fascinating rabbit hole!
        
        Here’s an archived support article from 2007 describing a tool Microsoft released to help people update the incorrect times in their calendars: https://web.archive.org/web/20070302224145/http://support.microsoft.com/kb/930879
        
        I added this to my blog post: https://simonwillison.net/2024/Nov/27/storing-times-for-human-events/#microsoft-exchange-and-the-dst-update-of-2007
3. 2
  oliverpool 3 months ago
  Stupid question: how do you store the “ intended” time zone and date time? (e.g. in Postgres).
  
  As strings ? (Arbitrary + RFC3339 for the date time)
  
  “Europe/Paris”
  
  “2025-05-04T15:04”
  1. 4
    
    oliverpool 3 months ago
    
    Just discovered RFC9557 which suggests 2022-07-08T00:14:07+02:00[Europe/Paris] (if the +2 is inconsistent with the location, inform the user)
    1. 4
      
      masklinn 3 months ago
      
      Storing an offset and a symbolic timezone seems mostly counterproductive, the entire point of a symbolic timezone is that the offset can change between the moment you create the event and its actual occurrence.
      
      I guess it could make sense to warn viewers from non-local timezones that the offset (and thus their own time) has changed, but then those are the ones you’d want to inform, not the creator of the event (whom you’d assume is in the local timezone, thus created an event set to 00:14:07 which is still set to 00:14:07, that the offset of their local timezone to UTC at the moment of the event has changed is unlikely to be relevant)
      1. 4
        
        riking 3 months ago
        
        Yes, the point of the duplicate storage is to raise an error if they don’t match. The textual form controls if you want to ignore errors.
        
        2
        
        oliverpool 2 months ago
        
        Exactly. I just made a PoC in Go: https://go.dev/play/p/HDXB_K6DyT_f
        
        I store the datetimeWithOffset := "2006-06-02T15:04:05+02:00" (originally computed offset, likely a good fit for the TIMESTAMPTZ type of Postgres - @tonyfinn) and the intended location Europe/Paris.
        
        From there, I “convert” the time to the intended location and compare the offset with the stored one:
        
        if they match, everything is fine
        
        otherwise a user intervention is likely needed.
        
        @simonw I think this would address the interesting issue you raised here, no?
        
        3
        
        tonyfinn edited 2 months ago
        
        There’s two problems here.
        
        First the timezone change problem here is only a problem in the forward direction. No political entity has proposed retroactively changing dates in 2006. So let’s assume a date in 2026 instead of 2006.
        
        Secondly, TIMESTAMPTZ does not store “2026-06-02T15:04:05+02:00”. It stores no timezone or offset info whatever. What it stores is 1780401845000000 (microseconds since 1970-01-01T00:00:00Z). It’s basically a wrapper around transforming from the input zoned time to microseconds since the unix epoch on write, and producing a time in the connection timezone (default: system TZ) on read. But for most systems these days, a single global timezone at connection level or system level is insufficiently granular, since the timezone is a property of the data and not of the system.
        
        This has a couple of interesting effects:
        
        It’s no different to storing “2026-06-02T13:04:05Z”. The postgres docs describe this as converting to UTC and I’ve seen threads where people quibble about whether that’s really a UTC conversion or not, but the important part is the any offset or timezone information is lost, as is the time in the original timezone 2026-06-02T13:04:05Z might be the same as 2026-06-02T15:04:05+02:00 but until that time comes, you cannot definitively say it is the same as 2026-06-02T15:04:05[Europe/Paris].
        
        The date you get back depends on the connection time zone. So if you query it with a system with its time zone set to Europe/Paris, and nothing changes with Paris’s timezone rules, you’ll get back 2026-06-02T15:04:05+02:00 sure, but if you query it with a system with its time zone set to America/New_York it’ll be 2026-06-02T08:04:05-05:00. So your offset has no value as an error checking mechanism, it’s just made up based on whatever the connection time zone is.
        
        If Paris does change it’s DST rules in the time being (remember: the EU even has a passed resolution on the books where they’re planning to abolish DST), the time you actually want back is 2026-06-02T15:04:05+01:00, but for Postgres to give that result, 1780405445000000 would need to be stored in the TIMESTAMPTZ (while the actual value is 1780401845000000) and your connection timezone would have to be set to Europe/Paris.
        
        1
        
        oliverpool 2 months ago
        
        Thanks, I wrongly thought that TIMESTAMPTZ would store the offset somehow.
        
        So I would need a third column, storing this offset (in seconds probably). So that I can check if the offset (at the time of the event creation) is still correct (at a later time, eventually after a zone change of Paris).
        
        3
        
        zie 2 months ago
        
        You don’t want to store the offset, you want to store the city name/location. Politics dictate that the offset will change on occasion, whenever politicians get bored(globally this happens several times a year).
        
        Also, cultures sometimes have their own offsets, different from the legal offsets, which further complicate things.
        
        1
        
        oliverpool 2 months ago
        
        If I only store the location, I won’t know if the offset changed (due to politics or whatever).
        
        The goal is to know if the offset changed, to ask the user if the time should be updated as well.
        
        2
        
        zie 2 months ago
        
        Or just assume it is always changed, and convert through the TZ database.
        
        In either case you have to round trip through the TZ DB to see if it changed, wouldn’t it be easier to just assume it has and move on with life? There might be special use-cases where you NEED to know if it changed, but in most cases, you just care what the right value should be at the moment.
  2. 3
    tonyfinn 3 months ago
    
    Neither of Postgres’ timestamp types are actually very helpful for this scheduling use case. TIMESTAMPTZ converts the date from the input timezone to UTC but this is an operation you want late binding for rather than early binding (as this article specifies), while TIMESTAMP will implicitly use the system TZ for many operations.
    
    Some options:
    
    Two columns: TIMESTAMP + timezone. You then need to be careful to use convert with AT TIME ZONE before using any database date functions
    
    String + timezone. Again you have to convert for any date functions, in a more expensive way, but it’s harder to fix
    
    String, timezone and denormalized UTC - you get a sortable date column, but you need to manage when to regenerate the denormalized column
4. 1
  
  landon 3 months ago
  
  This suggestion (store UTC and localtime) is squicky to me because it leaves a lot of space for representable illegal states. In my experience, parallel structures always have bugs, and they tend to be of the “happens in corner cases on Saturday and are really hard to figure out” type.
  
  Time is hard enough already, and I’m also not sure how this suggestion solves any of the problems in the “what can go wrong” section. In all the user error cases, the user misrepresentation their intent to the system, so the stored user intent will be wrong. If a location or local TZ changes or is ambiguous, I’m at the behest of tzdata based timezone conversions regardless of what I stored. If I only store ome time, worst case it’s only wrong once.
  1. 1
    
    simonw 2 months ago
    
    I’m arguing for storing one - the local time - and then also storing a denormalized UTC copy if you happen to need that for other purposes.
    
    The local time one is the source of truth.
    1. 1
      
      landon 2 months ago
      
      storing one … also storing …
      
      I dont understand, it still sounds like you’re arguing for storing two? If there’s one source of truth, why ever store anything else?
      1. 4
        
        tonyfinn 2 months ago
        
        Zoned times in disparate zones aren’t comparable and hence aren’t sortable. If you want those queries to be efficient, you need to have an index on the UTC time which depending on the system you’re using for data storage might require you to have it precomputed (and occasionally recomputed, when DST rules change).
      2. 2
        
        simonw 2 months ago
        
        It’s denormalization - storing a duplicate value that can be derived from another value for performance and convenience.