Along with this: don’t use booleans as “flags” in your database. Use some sort of timestamp instead. Now you know when it was set, which you’ll suddenly find useful down the line.
Dates make a lot of sense for things where a date is relevant to the actual thing - a publish date, a modification date, a “sale starts”/ “sale ends” field.
The fields where I’m using a boolean in a database, I want to be able to express two, or possibly three actual states (a nullable boolean): “on; off” or “on; off; no preference aka inherit default”.
A date gives you at best “yes, as of/until ; no or maybe inherit default”
If you want to know when some value changed, you want an audit log, which is more useful for auditing anyway because it isn’t limited to storing a the last time it was changed, and it can store who changed it, what else was changed at the same time, etc.
When you do this you use null as default and timestamp as set to not-default at timestamp or? If someone turns off a setting after turning it on you just set it back to null?
Good question, I guess it depends on what you’re using as a “flag” here, but I guess I should have specified for things you’re unlikely to toggle back. I guess once again, a kind of “status”, except not a linear status but various conditions. First one that comes to mind is a “soft delete” or an “archive”.
Why is it that every article attempting to explain how null is some unspeakable horror, includes a tale about some application somewhere comparing to the string'null' and ensuing chaos.
You might as well say “don’t use Boolean, someone might compare it with string 'false'.
Why is it that every article attempting to explain how null is some unspeakable horror, includes a tale about some application somewhere comparing to the string ‘null’ and ensuing chaos.
Dereferencing a boolean (or any other properly initialized value) won’t cause a program to crash. And comparing a boolean to a string will result in a compile-time error. I’m guessing you already know that and are trolling at this point.
Funny story. I was working at a company that was building a high-visibility student transcript system. During a code review, I saw that someone had hard-coded a default value for the last name of a student to the literal string “null”. I brought this to the attention of another developer stating that if a student actually had the last name of Null, they wouldn’t be able to use the system. He went away came back a half hour later and said, “You’re right, that’s a bug.”
That would not have been a fun bug to track down; Hoare strikes again.
Another issue, of course, is that a null sentinel allows comparing two different classes of objects to the same value. (It explicitly excludes the sentinel value from the set of values that are valid elements.)
There are many, many reasons to make software both immutable and null-hostile, which is why the industry is slowly moving in that direction.
Dereferencing a boolean (or any other properly initialized value) won’t cause a program to crash.
You haven’t answered the actual question. Why are people hardcoding a string literal 'null' when comparing to an actual null?
Also, for the record - comparing a string to a boolean is a perfectly valid operation in any number of languages. It will equate to false if you’re doing things properly. I’m guessing you already knew that and are trolling at this point.
someone had hard-coded a default value for the last name of a student to the literal string “null”
Yes, that is a bug, because they’re treating a literal null and the string null as the same thing. If your language, or database doesn’t distinguish between null and 'null', pick a better language/database before you start telling everyone else on the planet that they shouldn’t use literal null, because someone somewhere is stupid enough to compare it with a string 'null'.
As long as you have sane auditing you can always go look. In your version you know what and when. With sane auditing you get all the W’s, well perhaps not they why, unless you go ask the who ;)
It sounds like what they’re suggesting is instead of having is_active, you’d have activated_at with a timestamp instead, where it’s null | timestamp, null being not activated.
As a variation of this is one of the best pieces of advice I ever received was early on in my professional career from my first software engineering manager - possibly technically while I was still a student (started off as an unpaid OSS contributor) - was to not use bools as arguments for anything beyond single argument functions of the form setFlag(bool), isFlagSet(), and similar - always use enums, as it makes it very clear.
It’s important to recognize that this is something of a language issue, for example objc, rust, swift, etc understand named parameters are good and mitigate this.
But for C, C++, Java, Pascal (Delphi!), etc it’s always the best approach unless you have ABI reasons that prevent it (basically not an issue for C, but for C++ it impacts mangling).
There are numerous reasons bool arguments cause problems, the big ones in my experience being:
If you read code and see doStuff("foo", true, false) what do those flags mean? In code bases I work in these days with a lot of bool parameters doStuff("foo", /* makeItBlue */ true, /* deleteHomeDirectory */ false) meaning there’s a bunch of non-compiler enforced (or enforceable) style choices
In languages that support function overloading - both explicitly or implicitly (default parameters are semantically function overloading) - you also run into “I added a new argument and everything continued to compile” where the additional argument matches the type of existing parameters, so as a result adding an additional argument just mean the wrong argument mapping occurs in existing code (my approach in my current day to day code base is to introduce parameters as enums, and then “correct” to the coding style of bools for everything when everything is other complete)
niche because I work on apple platforms: APIs that make sense (when reading them, not semantically) when written as true/false does not read well when using YES/NO and vice versa. Using an enum makes that a non issue.
I definitely agree this is a language issue. I would say it doesn’t only apply to booleans, but more generally to all primitive types - e.g. it’s also unclear what doStuff(i, 0x231, 76) does too!
re: rust - yeah it’s weird it’s be a few years since I worked in rust so I think I just mentally went “by and large it makes sensible design choices so it must have named parameters”. womp womp.
Rust doesn’t have named function parameters (sadly), but the rust-analyzer developer environment compensates for it by annotating it anyway. There’s an RFC for it: https://github.com/rust-lang/rfcs/issues/323
oh damn, I guess I just misremembered that entirely - in my defense it’s been a few years since I worked in rust and its a good design choice so I must have just assumed it did without thinking. derp :D
Maybe you were thinking of struct installation syntax:
MyStruct { flagA: bool, flagB: bool} is always constructed by naming all fields: MyStruct{ flagA: true, flagB: true }. That works quite well.
I am doubtful rust will add named parameters since rust is very big on exhaustiveness (no default values for steuct fields either for examlle) and named parameters where you always need to specify all of them aren’t that useful
A useful note on top of your earliest point about function parameters - python for example allows you to ENFORCE the kwargs. You can do def some_func(param1, **, flag1, flag2=false). This makes it so if you want to pass those flags you HAVE to call it as some_func(my_param, flag1=False, flag2=True)
Does this mean you should be using flags? Nope it’s a code smell. But if you want to have non magical params on a function python does let you enforce it.
oh that’s interesting, and a nice solution to introducing named parameters without breaking source compatibility. Is it callee or caller enforced? I assume callee, but I could see an argument for either - though I guess most python is now P3 so there’s less concern about having python libraries support being used in P2 or P3.
One thing I started doing back when we added deconstructing assignment in JS was to just pass objects and do:
function f({arg1, arg2}) { ... }
f({arg1: 1, arg2: "yay"})
Which has quite a bit of overhead, but for entry point style functions that aren’t called a lot it makes much nicer.
Nowadays it’s probably not even that expensive if the target function is trivial - in the period that deconstructing assignment and such were introduced JS engines weren’t as aggressive about lowering temporary object construction - it’s been a long time since I worked on a JS engine, but I would expect that today if the above f function were inlined, the temp object would not be created (mostly due to optimizations that were made for for(of))
There’s two cases here: one is fine, the other not so fine.
If the status value is entirely private, cannot be accessed outside the single linkable unit (program or library) and never will be, then it’s fine to use an enum.
If the status value might at some time become public, then the bit-patterns (numbers etc.) which underlie the enum should be published and locked down lest somebody well-meaning adds “uninitialised” to the start of the list thus incrementing the other values.
For enums or booleans, I’m often asking the developer to add something in the comments of the declaration that explains when this value is supposed to be in which state. Like ‘x’ is true after the connection is established, not when the connection is being attempted… that sort of thing. Helps a lot with debugging later down the line.
Along with this: don’t use booleans as “flags” in your database. Use some sort of timestamp instead. Now you know when it was set, which you’ll suddenly find useful down the line.
Dates make a lot of sense for things where a date is relevant to the actual thing - a publish date, a modification date, a “sale starts”/ “sale ends” field.
The fields where I’m using a boolean in a database, I want to be able to express two, or possibly three actual states (a nullable boolean): “on; off” or “on; off; no preference aka inherit default”.
A date gives you at best “yes, as of/until ; no or maybe inherit default”
If you want to know when some value changed, you want an audit log, which is more useful for auditing anyway because it isn’t limited to storing a the last time it was changed, and it can store who changed it, what else was changed at the same time, etc.
GP meant mutable state hacky flags. Not immutable boolean properties of conceptual relational entity.
If you have a boolean column and go around changing the value, you are very likely doing it wrong. Model your changes instead.
Check Rick Hickey take on how state is handled in clojure for a straight forward explanation of state vs data.
Checked around but got some ambiguous results - could you share the specific take by Rick Hickey?
https://clojure.org/about/state
It’s less detailed than I remember, but does make a good job setting values and state apart.
probably this one: https://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey/
When you do this you use
null
as default andtimestamp
asset to not-default at timestamp
or? If someone turns off a setting after turning it on you just set it back tonull
?Good question, I guess it depends on what you’re using as a “flag” here, but I guess I should have specified for things you’re unlikely to toggle back. I guess once again, a kind of “status”, except not a linear status but various conditions. First one that comes to mind is a “soft delete” or an “archive”.
Use 5th or 6th normal form and eliminate
null
altogether.https://dave.autonoma.ca/blog/2019/06/06/web-of-knowledge/
Why is it that every article attempting to explain how
null
is some unspeakable horror, includes a tale about some application somewhere comparing to the string'null'
and ensuing chaos.You might as well say “don’t use Boolean, someone might compare it with string
'false'
.Dereferencing a boolean (or any other properly initialized value) won’t cause a program to crash. And comparing a boolean to a string will result in a compile-time error. I’m guessing you already know that and are trolling at this point.
https://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare/
Funny story. I was working at a company that was building a high-visibility student transcript system. During a code review, I saw that someone had hard-coded a default value for the last name of a student to the literal string “null”. I brought this to the attention of another developer stating that if a student actually had the last name of Null, they wouldn’t be able to use the system. He went away came back a half hour later and said, “You’re right, that’s a bug.”
That would not have been a fun bug to track down; Hoare strikes again.
https://www.wired.com/2015/11/null/
Another issue, of course, is that a
null
sentinel allows comparing two different classes of objects to the same value. (It explicitly excludes the sentinel value from the set of values that are valid elements.)There are many, many reasons to make software both immutable and null-hostile, which is why the industry is slowly moving in that direction.
https://github.com/google/guava/wiki/UsingAndAvoidingNullExplained
You haven’t answered the actual question. Why are people hardcoding a string literal
'null'
when comparing to an actualnull
?Also, for the record - comparing a string to a boolean is a perfectly valid operation in any number of languages. It will equate to false if you’re doing things properly. I’m guessing you already knew that and are trolling at this point.
Yes, that is a bug, because they’re treating a literal null and the string null as the same thing. If your language, or database doesn’t distinguish between
null
and'null'
, pick a better language/database before you start telling everyone else on the planet that they shouldn’t use literalnull
, because someone somewhere is stupid enough to compare it with a string'null'
.As long as you have sane auditing you can always go look. In your version you know what and when. With sane auditing you get all the W’s, well perhaps not they why, unless you go ask the who ;)
so i stead of
is_active
beingtrue | false
its eithernull
(for “false”/unset) or a timestamp? am i understanding you right?It sounds like what they’re suggesting is instead of having
is_active
, you’d haveactivated_at
with a timestamp instead, where it’snull | timestamp
,null
being not activated.Which is quite the opposite of the advice given in the featured article.
Solid advice, hard to argue with.
As a variation of this is one of the best pieces of advice I ever received was early on in my professional career from my first software engineering manager - possibly technically while I was still a student (started off as an unpaid OSS contributor) - was to not use bools as arguments for anything beyond single argument functions of the form
setFlag(bool)
,isFlagSet()
, and similar - always use enums, as it makes it very clear.It’s important to recognize that this is something of a language issue, for example objc, rust, swift, etc understand named parameters are good and mitigate this.
But for C, C++, Java, Pascal (Delphi!), etc it’s always the best approach unless you have ABI reasons that prevent it (basically not an issue for C, but for C++ it impacts mangling).
There are numerous reasons bool arguments cause problems, the big ones in my experience being:
doStuff("foo", true, false)
what do those flags mean? In code bases I work in these days with a lot of bool parametersdoStuff("foo", /* makeItBlue */ true, /* deleteHomeDirectory */ false)
meaning there’s a bunch of non-compiler enforced (or enforceable) style choicesI definitely agree this is a language issue. I would say it doesn’t only apply to booleans, but more generally to all primitive types - e.g. it’s also unclear what
doStuff(i, 0x231, 76)
does too!Btw: Rust doesn’t have named function arguments.
re: rust - yeah it’s weird it’s be a few years since I worked in rust so I think I just mentally went “by and large it makes sensible design choices so it must have named parameters”. womp womp.
Rust doesn’t have named function parameters (sadly), but the rust-analyzer developer environment compensates for it by annotating it anyway. There’s an RFC for it: https://github.com/rust-lang/rfcs/issues/323
oh damn, I guess I just misremembered that entirely - in my defense it’s been a few years since I worked in rust and its a good design choice so I must have just assumed it did without thinking. derp :D
Maybe you were thinking of struct installation syntax:
MyStruct { flagA: bool, flagB: bool}
is always constructed by naming all fields:MyStruct{ flagA: true, flagB: true }
. That works quite well.I am doubtful rust will add named parameters since rust is very big on exhaustiveness (no default values for steuct fields either for examlle) and named parameters where you always need to specify all of them aren’t that useful
This is why I love atoms in Erlang/Elixir:
A useful note on top of your earliest point about function parameters - python for example allows you to ENFORCE the kwargs. You can do
def some_func(param1, **, flag1, flag2=false)
. This makes it so if you want to pass those flags you HAVE to call it assome_func(my_param, flag1=False, flag2=True)
Does this mean you should be using flags? Nope it’s a code smell. But if you want to have non magical params on a function python does let you enforce it.
oh that’s interesting, and a nice solution to introducing named parameters without breaking source compatibility. Is it callee or caller enforced? I assume callee, but I could see an argument for either - though I guess most python is now P3 so there’s less concern about having python libraries support being used in P2 or P3.
One thing I started doing back when we added deconstructing assignment in JS was to just pass objects and do:
Which has quite a bit of overhead, but for entry point style functions that aren’t called a lot it makes much nicer.
Nowadays it’s probably not even that expensive if the target function is trivial - in the period that deconstructing assignment and such were introduced JS engines weren’t as aggressive about lowering temporary object construction - it’s been a long time since I worked on a JS engine, but I would expect that today if the above
f
function were inlined, the temp object would not be created (mostly due to optimizations that were made forfor(of)
)This post reminds me of this one I read a while back: Boolean Blindness.
There’s two cases here: one is fine, the other not so fine.
If the status value is entirely private, cannot be accessed outside the single linkable unit (program or library) and never will be, then it’s fine to use an enum.
If the status value might at some time become public, then the bit-patterns (numbers etc.) which underlie the enum should be published and locked down lest somebody well-meaning adds “uninitialised” to the start of the list thus incrementing the other values.
I also find
setFoo(x, .active)
much nicer thansetFoo(x, true)
, even if there are only ever 2 values.I love this idea. A very similar blog post: “Don’t use booleans”.
For enums or booleans, I’m often asking the developer to add something in the comments of the declaration that explains when this value is supposed to be in which state. Like ‘x’ is true after the connection is established, not when the connection is being attempted… that sort of thing. Helps a lot with debugging later down the line.
Not only should you use an enum for your status field, you should consider managing that status with a state machine.