Yuval Shavit: May 2014

I implemented open types in Effes the other day, so I’m gearing up for the next big push: generics! I was thinking of doing tuples first, but they have all of the same complexities as full-blown generics. (You can think of tuples as just sugar around predefined generic classes like Tuple3[A,B,C] — in fact, a bunch of languages do exactly that.)

Generics interact with type disjunction in interesting ways. For instance, what happens when you disjoin Box[A] and Box[B]? Is it a vanilla disjunction, or are disjunctions distributive, so that Box[A] | Box[B] becomes Box[A | B]? Both approaches have their pros and cons.

I’ll call the first one the “standard” option, and the second one the “distributive” one. I’ll illustrate withtype Maybe[A] = One[A] | Nothing, which uses type One[A](elem: A). When you disjoin Maybe[A] | Maybe[B], Effes will expand both Maybes, leading to Maybe[A] | Maybe[B] | Nothing | Nothing, which simplifies to just Maybe[A] | Maybe[B] | Nothing. And then what?

The standard option is straightforward. When you pattern match, you have to specify which of the alternatives you want, filled out completely (with the generic parameter and all). This has the chief benefit of being simple, though the syntax it suggests is a bit clunky:

case mysteryBox of
    One[A](elem): handleA elem
    One[B](elem): handleB elem
    Nothing: handleNothing

The disjunctive interpretation, on the other hand, feels really dynamic, which I like. I think one of the strengths of Effes is that it gives you the feel of dynamic typing with the protections of static typing. In this view of things, mysteryBox isn’t one of three concrete options as above; it’s one of two options, the first of which is itself fuzzy.

For instance, let’s say we’re painting a layer with transparency. A given pixel could have a color or not, and the color could be specified by RGB value or by name: Maybe[Rgb] | Maybe[ColorName]. If there’s already a method paintPixel(color: Rgb | ColorName), the distributive option works perfectly. You don’t need to specify the generic parameter in the pattern match, because it’s unamibiguous to the compiler:

case maybeColor of
    One(c): paint c -- c:(Rgb | ColorName)
    Nothing: paintTransparency

This is nice, but I think there are times when the user won’t want that flexibility; they’ll want to treat each option separately. In a differently-factored version of the above, we may want the non-distributive option, so that we can feed the color to paintRgb or paintNamed, as appropriate.

One argument in favor of the distributive option is that it can simulate the standard option pretty easily:

case maybeColor of
    One(c): case c of
        Rgb: paintRgb c
        ColorName: paintNamed c
    Nothing: paintTransparency

That looks promising, but it’s actually very limited: it breaks down when the container can hold multiple items, instead of just one. For instance, what if we want to paint a row of columns, typed as List[Rgb] | List[NamedColor]? The nested case doesn’t work naturally. At best, we can wait for lambdas, then perform an inline map on the list, but that’s more complicated than it should be.

And lastly, the distributive approach takes a huge liberty with the programmer’s semantics. A List[A] is a homogeneous list of As; a List[A] | List[B] represents either a List[A] or a List[B]. To change that to a heterogeneous list of (A | B) is a big departure from the explicitly-written code.

All of that is to say that the standard system, despite its increased verbosity and stodgy syntax, is almost definitely the right approach. But wait! We can throw a big of sugar at the problem to make the standard approach feel like the hip, distributive one!

The first problem with the syntax was that awkward combo of square brackets and parenthesis: One[A](elem). We can solve this by borrowing from our method declaration syntax, and putting the type inside the parens: One(elem: A). Feels better already.

Next, we can take that one step further. If no type is specified, then the compiler will try to rewrite the case with each of the possible patterns, using the one in the code as a template. So, this:

case mysteryBox of
    One(elem): handle elem
    Nothing: handleNothing

… is just sugar for:

case mysteryBox of
    One(elem: A): handle elem
    One(elem: B): handle elem
    Nothing: handleNothing

One of the things I like about this is that it adds to the sugar of the language without adding to the amount of sugar the programmer needs to think about, because it complements the invoke-on-disjunction sugar so nicely.

One area that’s important to keep in mind is how types with multiple generic parameters will interact with error messages. Consider this snippet:

case foo of
    Pair(o1, o2): doSomethingWith o1 [o2]
    ...

(The syntax is a bit funky, and I may change it; but that just calls doSomethingWith with two arguments, o1 and o2. You can essentially ignore the square brackets.)

Here, o1 may be of type A or B, and o2 may be C or D. But we don’t get all four combinations: if o1 is A, then o2 must be C, and if o1 is B, then o2 must be D. That’s simple enough if you write the expansion out, but if you make a mistake in your head, the error message could confuse you more than it helps. For instance, imagine if doSomethingWith takes an A and a D and you get an error message saying something like “doSomethingWith expected types [A| B, C | D] but saw [A, D].” Doesn’t that look like it’s complaining that it got good inputs? A better message would be doSomethingWith expected types [A, C] or [B, D] but saw [A, D].” Even then, I’m not sure this would be clear to someone who’s new to the language.

In my last post, I talked about open aliases and how they can be used to achieve polymorphism. Since then, I’ve been a bit stuck on the exact syntax for them. I don’t know if that’s silly or useful; syntax seems like such a superficial concern, but then again, it makes a difference if a language looks nice.

Here’s the syntax I used in that last post:

open type Boolean:
    def negate -> Boolean

type True:
    def negate -> False: return False
Boolean |= True

This has some nice elements, but it also has some negatives.

Pro: open type is pretty explicit
Con: Requires adding open as a keyword, but it’s a natural function name for I/O (like opening a stream)
Pro: Boolean |= mirrors the familiar |= operator (from other languages we know and love), so that we naturally read it as “Boolean is whatever it previously was, or True”
Con: |= doesn’t lend itself to being put in the type definition, as opposed to top-level as above. It would have to look something like this:
```
type True:
    |= Boolean
    def negate -> False: return False
```
… but that’s not good because it reads as True |= Boolean, which is the flip of what we really want to say. If we want to say that True is an alternative for Boolean from within True’s definition, we really need the open type to be on the right of the statement.

I tried various other alternatives. For instance, I thought about using ellipses to mark open types (type Boolean = ...), but ellipses are commonly used in code fragments to say “some code goes here,” and I didn’t want to introduce that ambiguity. For adding to an open type, I even went as far as considering True (- Boolean, where (- was supposed to look like ∈. Nice try, but nope.

Here’s the syntax I settled on in the end:

type Boolean = ?
    def negate -> Boolean

type True:
    def negate -> Boolean
    is Boolean
    ...

(Note that in this latest snippet, ... is back to its usual, informal definition of “some code goes here.”) This does require adding is as a keyword, but I’m not too worried about that. My bigger concern with is is that it evokes the “is a” concept from OO, but I think I’m just going to have to bite the bullet on that; everything else I can think of is worse.

Yuval Shavit

Friday, May 9, 2014

Next up: generics!

Monday, May 5, 2014

Syntax for open types