Yuval Shavit: nulls

Here at last is that rant about Optional<T> I've promised for so long. Let me preface it by saying that I am not about to propose an ideal way of handling nulls in Java; I don't think Java's null handling will ever be great. That said, there are better and worse ways of doing it, and I think Optional<T> isn't the best way. What's worse, it's edging out a better way.

For the unfamiliar, Optional<T> is a Guava class that aims to eliminate NullPointerExceptions. It has two forms: Optional.absent() and Optional.of(T item). Rather than a method passing back a nullable Foo, it returns an Optional<Foo>. You then call isPresent(), followed by get() iff the item is present.

Optional<Foo> myFooOpt = tryGetFoo();
if (myFooOpt.isPresent()) { // like a != null check
    Foo myFoo = myFooOpt.get();
    // work with the foo
} else {
    throw NoFooFoundException(); // or whatever
}

The idea is that since you have to call get() to get at the Foo, you'll probably remember to check isPresent first — and thus, no NPEs. It seems reasonable enough, but there are two big problems with it. First, it's verbose; and second, it's not backwards compatible.

The verbosity comes down to a lack of pattern matching in Java. Optional<T> is inspired by functional programming languages that have pattern matching — think of it (very roughly) as an instanceof check combined with an un-constructor. Here's how you'd use Haskell's equivalent of Optional<T>:

case tryGetFoo of
    Just foo -> handleFoo foo
    _ -> handleNoFoo

See how much cleaner that is? Optional<T>-type constructs really benefit from a terse way to get at the wrapped object. Pattern matching lets you do this two ways: by combining the isPresent() and the get(), and by therefore eliminating the need for that temporary, throwaway reference to myFooOpt.

Java is trying to move away from verbose boilerplate; one could argue that the driving force behind both Java 7 and 8 is conciseness, not new features. So why is the Java world embracing the overly-verbose Optional<T>?

The backwards compatibility problem is more clear-cut: existing libraries can't be retrofitted with Optional<T> without huge changes to how overload and method resolution is handled. For instance, Map.get returns V — you can't just change it to return Optional<V> without breaking a lot of code.

Before Optional<T> got cool, one idea people had was to use annotations to do static analysis on the code. Mark a field as @Null, and you know it can be nullable; try to use it without checking for nullity, and you'll get a warning. Nullity can be propagated through result types and arguments, and it all checks out at compile time.

The best part is that you can retrofit it to existing classes. Map.get will never return an Optional<V>, but it could return a @Null V.

There were a few different attempts at these checks, each leading to different sets of annotations. If I had it my way, we'd see one of these — preferably a concise one — get Oracle's official blessing and widespread usage.

A type checker has to be conservative, and that means that you'd have to assume that legacy code always returns nullable references. On the other hand, for new code you'd want an un-annotated method to be assumed to be @NotNull (to cut down on verbosity). This mismatch could be solved in three ways.

Classes compiled annotated with a new @NullChecked annotation would also have their methods assumed to be @NotNull.
All newly compiled code would assume @NullChecked
The type checker could take additional inputs in the form of files that list methods which should be treated as @NotNull regardless of their bytecode.

The third one of those would mean that you could mark methods as not-nullable without touching their bytecode at all. This could be useful for some serialization issues, but more importantly, it would let people locally update projects without waiting on their maintainers.

With that migration path in place, compilers could start treating unsafe dereferencing as errors rather than warnings. And maybe, just maybe, Java can recognize it as important enough as to warrant syntactic sugar: T? as shorthand for @Null T. Kotlin employs a similar trick, and while I haven't actually used it, it sure looks nice.

There are other tricks you can do with annotations that expose a lot of power (including how it interacts with subtyping, etc), at the cost of more complexity. I'm not sure Java needs all those — but even without any of them it's still at least as powerful as Optional<T> — with the added benefit of backwards compatibility.

I'm not sure why annotation-based static analysis never caught on. Maybe the pushes were too fragmented, and developers weren't willing to hack in ugly ways to solve backwards compatibility (like my "additional inputs" file)? Maybe the edge cases are just too many and complicated? A quick google search didn't give me any answers.

A lot of functional and functional-inspired languages don't have the concept of null. Instead, they have types called Maybe or Optional — basically a box of 0 or 1 items. Effes is going to take that approach, but I might put a twist on it.

In a nutshell, the idea behind Maybe (I'll settle on Haskell's terminiology) is that there's a Nothing that represents the absense of something, a Just Foo that represents one Foo, and a Maybe Foo type which can be either a Nothing or a Just Foo.

Like other functional languages, Haskell has syntax (called pattern matching) that's kinda-sorta like an instanceof check plus a downcast in Java. Putting it all together looks something like this:

sayHi :: (Show e) => e -> String
sayHi Nothing = "Nothing to say hi to!"
sayHi (Just e) = "Hello, " ++ (show e) ++ "!"

(The (Show e) => syntax just means that e has a show method, which is like Java's toString.) In Effes, a direct translation would be a disjunctive type:

data Nothing
data Just[A] = elem : A
type Maybe[A] = Nothing | Just[A]

sayHi (m:Maybe[A Stringable]) -> String:
  case m of
      Nothing: "Nothing to say hi to!"
      Just e: "Hello, {{e}}!"

Because Effes has a more flexible type system, we can actually get away without the Just part of the Maybe pair. Instead, it looks something like this:

data Nothing
type Maybe[A] = Nothing | A

sayHi (m:Maybe[A Stringable]) -> String:
  case m of
      Nothing: "Nothing to say hi to!"
      e: "Hello, {{e}}!"

There's not a really strong driving force for this, except that it seems a bit cleaner. Instead of a Maybe being "either nothing or a box of one something," it's "either nothing or one thing." Plus it takes advantage of my cool new type system, so that's nice too.

The problem is when the A type is itself a Maybe: Maybe[Maybe[A]]. If we see that it contains a Nothing, does that mean we didn't have anything, or that we had one Nothing? To prevent unreachable code, I'd probably want the type checker to reject this altogether: Maybe[Maybe[String]] would be a type error.

That's not terrible, I guess, but the erroring type could be nestled in some data structure. For instance, if a linked list uses Maybe to signify an end, then LinkedList[Maybe[String]] wouldn't compile — and probably with some unintuitive or frustratingly un-actionable error message.

On balance, I'm leaning towards keeping the Just type. It doesn't add much complexity to the Maybe type, pattern matching keeps the call sites simple, and it eliminates ambiguity.

Yuval Shavit

Friday, December 27, 2013

Of Optional and nulls

Tuesday, September 10, 2013

Maybe it's optional?