Always use an enum for your status field

72

Always use an enum for your status field programming jmduke.com
via knl 2 months ago | caches
Archive.org Archive.today Ghostarchive
| 32 comments

32

1. 22
  
  Slackwise 2 months ago | link
  
  Along with this: don’t use booleans as “flags” in your database. Use some sort of timestamp instead. Now you know when it was set, which you’ll suddenly find useful down the line.
  1. 8
    
    stephenr 2 months ago | link
    
    Dates make a lot of sense for things where a date is relevant to the actual thing - a publish date, a modification date, a “sale starts”/ “sale ends” field.
    
    The fields where I’m using a boolean in a database, I want to be able to express two, or possibly three actual states (a nullable boolean): “on; off” or “on; off; no preference aka inherit default”.
    
    A date gives you at best “yes, as of/until ; no or maybe inherit default”
    
    If you want to know when some value changed, you want an audit log, which is more useful for auditing anyway because it isn’t limited to storing a the last time it was changed, and it can store who changed it, what else was changed at the same time, etc.
    1. 3
      
      pm 2 months ago | link
      
      GP meant mutable state hacky flags. Not immutable boolean properties of conceptual relational entity.
      
      If you have a boolean column and go around changing the value, you are very likely doing it wrong. Model your changes instead.
      
      Check Rick Hickey take on how state is handled in clojure for a straight forward explanation of state vs data.
      1. 2
        
        manuraj 2 months ago | link
        
        Checked around but got some ambiguous results - could you share the specific take by Rick Hickey?
        
        2
        
        pm 2 months ago | link
        
        https://clojure.org/about/state
        
        It’s less detailed than I remember, but does make a good job setting values and state apart.
        
        1
        
        weaksauce 2 months ago | link
        
        probably this one: https://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey/
  2. 3
    
    gpm 2 months ago | link
    
    When you do this you use null as default and timestamp as set to not-default at timestamp or? If someone turns off a setting after turning it on you just set it back to null?
    1. 3
      
      Slackwise 2 months ago | link
      
      Good question, I guess it depends on what you’re using as a “flag” here, but I guess I should have specified for things you’re unlikely to toggle back. I guess once again, a kind of “status”, except not a linear status but various conditions. First one that comes to mind is a “soft delete” or an “archive”.
    2. 2
      
      thangalin 2 months ago | link
      
      Use 5th or 6th normal form and eliminate null altogether.
      
      https://dave.autonoma.ca/blog/2019/06/06/web-of-knowledge/
      1. 5
        
        stephenr 2 months ago | link
        
        Why is it that every article attempting to explain how null is some unspeakable horror, includes a tale about some application somewhere comparing to the string 'null' and ensuing chaos.
        
        You might as well say “don’t use Boolean, someone might compare it with string 'false'.
        
        3
        
        thangalin edited 2 months ago | link
        
        Why is it that every article attempting to explain how null is some unspeakable horror, includes a tale about some application somewhere comparing to the string ‘null’ and ensuing chaos.
        
        Dereferencing a boolean (or any other properly initialized value) won’t cause a program to crash. And comparing a boolean to a string will result in a compile-time error. I’m guessing you already know that and are trolling at this point.
        
        https://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare/
        
        Funny story. I was working at a company that was building a high-visibility student transcript system. During a code review, I saw that someone had hard-coded a default value for the last name of a student to the literal string “null”. I brought this to the attention of another developer stating that if a student actually had the last name of Null, they wouldn’t be able to use the system. He went away came back a half hour later and said, “You’re right, that’s a bug.”
        
        That would not have been a fun bug to track down; Hoare strikes again.
        
        https://www.wired.com/2015/11/null/
        
        Another issue, of course, is that a null sentinel allows comparing two different classes of objects to the same value. (It explicitly excludes the sentinel value from the set of values that are valid elements.)
        
        There are many, many reasons to make software both immutable and null-hostile, which is why the industry is slowly moving in that direction.
        
        https://github.com/google/guava/wiki/UsingAndAvoidingNullExplained
        
        3
        
        stephenr 2 months ago | link
        
        Dereferencing a boolean (or any other properly initialized value) won’t cause a program to crash.
        
        You haven’t answered the actual question. Why are people hardcoding a string literal 'null' when comparing to an actual null?
        
        Also, for the record - comparing a string to a boolean is a perfectly valid operation in any number of languages. It will equate to false if you’re doing things properly. I’m guessing you already knew that and are trolling at this point.
        
        someone had hard-coded a default value for the last name of a student to the literal string “null”
        
        Yes, that is a bug, because they’re treating a literal null and the string null as the same thing. If your language, or database doesn’t distinguish between null and 'null', pick a better language/database before you start telling everyone else on the planet that they shouldn’t use literal null, because someone somewhere is stupid enough to compare it with a string 'null'.
  3. 2
    
    zie edited 2 months ago | link
    
    As long as you have sane auditing you can always go look. In your version you know what and when. With sane auditing you get all the W’s, well perhaps not they why, unless you go ask the who ;)
  4. 1
    
    zk 2 months ago | link
    
    so i stead of is_active being true | false its either null (for “false”/unset) or a timestamp? am i understanding you right?
    1. 6
      
      hugomd 2 months ago | link
      
      It sounds like what they’re suggesting is instead of having is_active, you’d have activated_at with a timestamp instead, where it’s null | timestamp, null being not activated.
      1. 1
        
        kubanczyk 2 months ago | link
        
        Which is quite the opposite of the advice given in the featured article.
2. 9
  
  BrianDouglasIE 2 months ago | link
  
  Solid advice, hard to argue with.
3. 7
  olliej 2 months ago | link
  As a variation of this is one of the best pieces of advice I ever received was early on in my professional career from my first software engineering manager - possibly technically while I was still a student (started off as an unpaid OSS contributor) - was to not use bools as arguments for anything beyond single argument functions of the form setFlag(bool), isFlagSet(), and similar - always use enums, as it makes it very clear.
  
  It’s important to recognize that this is something of a language issue, for example objc, rust, swift, etc understand named parameters are good and mitigate this.
  
  But for C, C++, Java, Pascal (Delphi!), etc it’s always the best approach unless you have ABI reasons that prevent it (basically not an issue for C, but for C++ it impacts mangling).
  
  There are numerous reasons bool arguments cause problems, the big ones in my experience being:
  
  If you read code and see doStuff("foo", true, false) what do those flags mean? In code bases I work in these days with a lot of bool parameters doStuff("foo", /* makeItBlue */ true, /* deleteHomeDirectory */ false) meaning there’s a bunch of non-compiler enforced (or enforceable) style choices
  
  In languages that support function overloading - both explicitly or implicitly (default parameters are semantically function overloading) - you also run into “I added a new argument and everything continued to compile” where the additional argument matches the type of existing parameters, so as a result adding an additional argument just mean the wrong argument mapping occurs in existing code (my approach in my current day to day code base is to introduce parameters as enums, and then “correct” to the coding style of bools for everything when everything is other complete)
  
  niche because I work on apple platforms: APIs that make sense (when reading them, not semantically) when written as true/false does not read well when using YES/NO and vice versa. Using an enum makes that a non issue.
  1. 4
    
    dist1ll 2 months ago | link
    
    I definitely agree this is a language issue. I would say it doesn’t only apply to booleans, but more generally to all primitive types - e.g. it’s also unclear what doStuff(i, 0x231, 76) does too!
    
    Btw: Rust doesn’t have named function arguments.
    1. 1
      
      olliej 2 months ago | link
      
      re: rust - yeah it’s weird it’s be a few years since I worked in rust so I think I just mentally went “by and large it makes sensible design choices so it must have named parameters”. womp womp.
  2. 4
    
    algesten 2 months ago | link
    
    Rust doesn’t have named function parameters (sadly), but the rust-analyzer developer environment compensates for it by annotating it anyway. There’s an RFC for it: https://github.com/rust-lang/rfcs/issues/323
    1. 2
      
      olliej 2 months ago | link
      
      oh damn, I guess I just misremembered that entirely - in my defense it’s been a few years since I worked in rust and its a good design choice so I must have just assumed it did without thinking. derp :D
      1. 1
        
        pascalkuthe 2 months ago | link
        
        Maybe you were thinking of struct installation syntax:
        
        MyStruct { flagA: bool, flagB: bool} is always constructed by naming all fields: MyStruct{ flagA: true, flagB: true }. That works quite well.
        
        I am doubtful rust will add named parameters since rust is very big on exhaustiveness (no default values for steuct fields either for examlle) and named parameters where you always need to specify all of them aren’t that useful
  3. 3
    linkdd 2 months ago | link
    
    This is why I love atoms in Erlang/Elixir:
    
    do_stuff("foo", :blue, :delete_home_dir)
  4. 1
    
    3digitdev 2 months ago | link
    
    A useful note on top of your earliest point about function parameters - python for example allows you to ENFORCE the kwargs. You can do def some_func(param1, **, flag1, flag2=false). This makes it so if you want to pass those flags you HAVE to call it as some_func(my_param, flag1=False, flag2=True)
    
    Does this mean you should be using flags? Nope it’s a code smell. But if you want to have non magical params on a function python does let you enforce it.
    1. 1
      
      olliej 2 months ago | link
      
      oh that’s interesting, and a nice solution to introducing named parameters without breaking source compatibility. Is it callee or caller enforced? I assume callee, but I could see an argument for either - though I guess most python is now P3 so there’s less concern about having python libraries support being used in P2 or P3.
      
      One thing I started doing back when we added deconstructing assignment in JS was to just pass objects and do:
      
      function f({arg1, arg2}) { ... } f({arg1: 1, arg2: "yay"})
      
      Which has quite a bit of overhead, but for entry point style functions that aren’t called a lot it makes much nicer.
      
      Nowadays it’s probably not even that expensive if the target function is trivial - in the period that deconstructing assignment and such were introduced JS engines weren’t as aggressive about lowering temporary object construction - it’s been a long time since I worked on a JS engine, but I would expect that today if the above f function were inlined, the temp object would not be created (mostly due to optimizations that were made for for(of))
4. 5
  
  hongminhee 2 months ago | link
  
  This post reminds me of this one I read a while back: Boolean Blindness.
5. 4
  
  MarkMLl 2 months ago | link
  
  There’s two cases here: one is fine, the other not so fine.
  
  If the status value is entirely private, cannot be accessed outside the single linkable unit (program or library) and never will be, then it’s fine to use an enum.
  
  If the status value might at some time become public, then the bit-patterns (numbers etc.) which underlie the enum should be published and locked down lest somebody well-meaning adds “uninitialised” to the start of the list thus incrementing the other values.
6. 3
  
  owl 2 months ago | link
  
  I also find setFoo(x, .active) much nicer than setFoo(x, true), even if there are only ever 2 values.
7. 1
  
  EvanHahn 2 months ago | link
  
  I love this idea. A very similar blog post: “Don’t use booleans”.
8. 1
  
  ansible-rs 2 months ago | link
  
  For enums or booleans, I’m often asking the developer to add something in the comments of the declaration that explains when this value is supposed to be in which state. Like ‘x’ is true after the connection is established, not when the connection is being attempted… that sort of thing. Helps a lot with debugging later down the line.
9. 1
  
  seabre 2 months ago | link
  
  Not only should you use an enum for your status field, you should consider managing that status with a state machine.