Thursday, 19 February 2009

A Post About Nothing, or, How I Learned To Stop Worrying And Love Nullable<T>

Iain Holder has an interesting post about Nullable<bool> over on his blog, about the pitfalls of using nullable boolean values:

What does null mean? It's got to mean something. You're either pregnant or you're not. You can't have a third state. A light switch is either on or off. If a light switch doesn't exist then it's potentially very dangerous. Almost as dangerous as having a nullable bool.

nullable_field_warning

Personally, I like null; I think it’s a wonderfully useful and often-misunderstood concept. Null is not dangerous per se, but used correctly, it can be a very effective warning sign.

Null means we don’t know – or alternatively, that question doesn’t make sense in this context.

To run with Iain’s example for a moment, let’s write a patient tracking system for a hospital; every time a patient arrives, we store their medical details in a database – blood type, next of kin, whether they’re pregnant, that kind of thing – so that the doctors can recommend appropriate treatment. All fields are required – no nulls allowed. It’s nice, simple, saves time, saves lives.

All very well, until an ambulance brings in an unconscious woman who might need X-rays. Is she pregnant? She might be – you don’t know; she came in alone, and she’s unconscious. The clock is ticking… an X-ray might save her life, but if she’s pregnant, it could harm her unborn child. If our patient system makes us choose true/false with no third option, we’re potentially making a very dangerous assumption either way. We need a third option, so when a doctor gets that patient’s chart, they see that we don’t know, and they can make sure they do a pregnancy test before sending the patient for X-rays or administering potentially dangerous medication.

I’m not arguing that NULL is the only solution to this problem. The problem is universal, and nullable fields is just one of many possible solutions. It happens to be a solution that’s natively supported in most databases and platforms, and I personally think the semantics of “null = don’t know” are rather nice. You may disagree – but you still need something to indicate when data in your model is potentially inaccurate or missing.

Yes, null has no place in a perfect model…

If your model (whether it’s an OO domain model or a relational database model) is complete, perfect, accurate and consistent, then you’re laughing. You will never have null values because your model maps perfectly onto the problem domain you’re working in, and you know every detail of every  entity in that problem domain. You know every single customer’s date of birth; you have detailed records of the marketing preferences of every person in your database, and your model is so perfectly tuned to your business that there’s no sparse tables, no outer joins -

…but there’s no such thing as a perfect model

A model is an abstraction, and abstractions always leak. The real world isn’t domain-driven, or relational, or object-oriented – these paradigms are just ways of slicing and storing information about the real world that help us solve a problem or do a job. A lot of the time we’re making decisions based on the value of the information, but sometimes we need to make decisions based on whether that information is present or not.

A slightly less contrived real-life example might help. In one of the systems I work on, we have a nullable bit field in our main customer database for e-mail opt-in preferences. We interpret the true/false/null values as follows:

True This customer has agreed to received marketing information via e-mail
False This customer has chosen not to receive marketing information via e-mail
Null We have no record of this customer's preferences.

This is the second Google Image search result for "null". I have no idea what it is - something to do with magnetic topology - but I think it's quite beautiful. When we send out e-mail newsletters, we only include the customers whose field is actually true – that’s the point of opt-in marketing, right? Any customer can log in and change their preferences at any time, so the customers who have opted-out (i.e. their value is false), we leave them alone – they’ve said they don’t want to get hassled, so we don’t hassle them, and if they change their mind, they can log in and reactivate their newsletter any time.

So what about null? Well, when a customer logs in, they get a personalized welcome page. If we see that their opt-in field is null, we add a message to this page saying “Hey, we have this e-mail newsletter you can subscribe to - could you take a moment to let us know whether you’re interested?” The point is, as soon as they’ve expressed a preference one way or the other, we’ll stop showing them this message, so a customer should only see this message the first few times they’ve logged in. We end up with better data; they don’t get unwanted e-mail, and everyone’s happy.

The weird swirly picture there turned up in a Google Images search for “null” – I don’t know enough about magnetic topology to have the faintest idea what it is, but it’s quite beautiful, don’t you think? Image © Colin Beveridge

4 comments:

NickG said...

NULL facilitates advanced data hiding techniques.

Where did you find that first image?!

Dylan said...

Photoshop, innit?

Iain Holder said...

I agree (to a point <g>) that it's a reasonable and pragmatic solution.

However in the spirit of weak opinions strongly held, I submit two further points:

1. I assume you have bit field in the database which is nullable. There's no reason why that couldn't be mapped to an enum of opted-in, opted-out and not-known. This would mean that the dumb database has this weirdness, but your lovely domain model has less ambiguity. Is it worth it? Debatable. But currently you have to know that the business rule for that field being null is that you haven't collected that information yet. What if you have a null in your AddressLine4 field. Does that mean the person doesn't have a 4th line or that they do, but you haven't collected it yet?

2. Comparing two NaNs will always give false, this is by design as you could've arrived to the NaNs in different ways. I don't see too much of a difference between comparing two NaNs and two nulls. (Oh the irony).

It's ok to ask if something *is* null. Just like you can ask if a number is NaN, but to compare one null with another is possibly crossing the conceptual boundary.

Dylan said...

Mapping nullable bools to enums might work quite nicely, actually. I can think of several scenarios where this would make sense.

As for AddressLine4... I disagree with the commonly-held principle that addresses should be stored as four separate lines. I have never done anything with those four address lines except join them together. I prefer to store Address as a single multiline text field, with Postcode and Country in their own fields, and enforce the "null = unknown or not supplied" interpretation.

As for null==null returning false... yes, it makes sense, and fits perfectly with the "null means unknown" interpretation. The ANSI SQL-92 standard actually specifies that any comparison of anything to NULL should return false, so there's at least one language standard that agrees with this principle. Do you know of any other languages that behave this way? Imagine modifying C# so that any comparison to NULL always returned false; legacy code would fail horribly, but I think you could actually construct something quite elegant. Hmm. Maybe there's another post in there...