Wednesday, July 8, 2015

In Soviet Russia, recursion invokes you!

This is one of those “I know what worked, but I don’t quite know what I learned” posts.

To implement my type-safe pattern matching, I have to essentially recurse down into two structures: the type of the thing being matched, and the individual pattern that’s trying to match it. That is, given:

t: List[Maybe[Boolean]]     //          call this the haystack type
case t of
    Cons(One(True), _): doWhatever(...) // and this the needle type

… I need to recurse both into the List[Maybe[Boolean]] and into the Cons(One(True), _). The question is, which of those looks like recursion, and which looks like a dispatch? The problem is interesting because both structures are polymorphic: the haystack can be simple type (Cons(head: T, tail: List[T])) or a union type (Cons(...) | Empty), while the needle type can either be a concrete type (One(True)) or a wildcard (_).

Essentially, some piece of code has to look something like this:

Result recursePolymorphically(Arg arg) {
  if (arg instanceof OneThing) { ... }
  else if (arg instanceof AnotherThing { ... }

and the question is whether I:

  • recurse into the haystack polymorphically, and instanceof-ify the needle
  • recurse into the needle polymorphically, and instanceof-ify the haystack

A quick observation that drove my thinking on this: the haystack type is potentially infinite (the tail of a Cons is a disjunction that includes a Cons, the tail of which is a disjunction that includes a Cons, and so on) while the needle is always finite. Thus, the base case must depend on the needle.

I tried the first of those first, since I like the idea of the base case being driven off the method’s arguments rather than an attribute of this; with the needle as an argument, the base case is when that argument represents a wildcard or a concrete, no-arg type.

The problem is that this didn’t work well with laziness (the subject of my next post), since it meant forcing the haystack more than I wanted. This in turn caused types to explode out much faster than they needed to. Instead of ending up with something like:

t': Cons(Nothing, ...)
  | Empty

I might end up with something like:

t': Cons(Nothing, Cons(Nothing, ...) | Empty)
  | Cons(Nothing, Cons(Maybe[True], ...) | Empty)
  | Cons(Nothing, Cons(Maybe[False], ...) | Empty)
  | Empty

This made testing more difficult, as I had to start balancing thoroughness and verbosity in the test code. It also means that in the case of a compilation error, the user would see a much more confusing error message. Would you rather see “saw True but expected Cons(Nothing, …)” or “saw True but expected Cons(Nothing, Cons(Nothing, …) | Empty) | Cons(Nothing, Cons(Maybe[True], …) | Empty) | Cons(Nothing, Cons(Maybe[False], …) | Empty) | Empty”? I just wrote that, and I can barely even make sense of it!

As it happens, the testing was important because this approach lent itself to more bugs, though I haven’t done the introspection to figure out why.

The explosion problem pretty much went away when I switched the recursion and the dispatch. Now, the polymorphic call to the needle’s Any form can take its unforced argument (a thunk representing some piece of the haystack’s type) and shove it into the result, still in its lazy form. The result becomes something like matched={<thunk>}, which keeps things from having to expand further.

Which gets me back to the question at the top: what did I learn? I still don’t know. I tried one technique, identified its flaws, and tried the other — but I didn’t learn how to pick techniques better. My algorithm is still a linear search.

Maybe what I learned is that the base case should be driven off polymorphism, not off method arguments. Or maybe it’s that lazy evaluation and polymorphism don’t mix well. Or maybe that sometimes, you just have to use brute force to figure new stuff out.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.