Yuval Shavit: nicknames

Tuesday, June 25, 2013

Sugar and the "data" keyword

In my last post, I discussed the data keyword as sugar for stateful, struct-like traits. This is mostly pure sugar, although it's also the only way to define a stateless, methodless trait (like Nothing). I'd like to justify those decisions, and also introduce two more pieces of sugar.

As I touched on in an earlier post, there is a balance to be had between deep abstractions vs. ease of use. Syntactic sugar, if designed correctly, can help that balance by highlighting certain aspects of an abstraction. This is useful both for the language design, which can now have a high-level abstraction whose concrete implications are clearer; and from an individual program's design, where it's more obvious which part of the abstraction is important for a given type or object.

The data syntax aims to do this by recognizing that a type whose only functions are getters feels different than one with abstract virtual methods. The first is used purely to store state, whereas the latter doesn't even care about state directly — just about behavior. Effes combines both of concepts into traits, so sugar can help clarify which aspect is more important for a a given type. When you want to focus exclusively on the state-storing ability of a trait, data is a better option, and when you need the behavior-defining aspect of a trait, the default syntax is a better (well, only) option.

As for requiring data syntax for stateless, methodless traits, this is partly pragmatics and partially a philosophical consideration. Given that the syntax for an abstract trait is something like this:

Sizeable:
  size : Int

... how would Nothing look with this syntax? It'd probably be something like one of these:

Nothing:
-- rest of your code here, unindented

Nothing -- no colon
-- rest of your code here, unindented

Nothing:
  pass

The first two feel like weird, dangling tokens; I don't like them from an aesthetic perspective. The last one borrows from Python's pass statement, which inserts a runtime no-op. It was designed for exactly the kind of situations Nothing would have using default trait syntax: the language requires something, but you want nothing. I've always thought this was a bit ugly. Within the context of a function, one could just use return instead; and within the context of a class definition, pass suggests that something weird is going on — a methodless, init-less, stateless class is a weird beast. In Effes, such a thing is indeed useful, but it feels very much like a data type; so, rather than introducing new syntax for it, or generalizing the default syntax to allow weird, ambiguous-looking constructs (like the first or second examples above), I just require the data syntax, which does the job just fine.

Finally, as promised, here are two more minor pieces of sugar for the data syntax. Firstly, you can define multiple types on one line, separated by a semicolon. And secondly, you can follow a data definition with "nicknamed foo" to automatically create a nickname for the union of all of the types defined on that data line. So this:

data LT
data GT
data EQ
nickname Comparison = LT | GT | EQ

...can be expressed more succinctly as:

data LT; GT; EQ nicknamed Comparison

Again, this tries to smooth the transition between abstractions and ease of use. One common use case for a type system is to define enum types; a comparison is either less than, greater than, or equal to. In a language like Haskell, these are different constructors of the same type. In Effes, they would be defined as the union type of the three traits, and one would almost definitely want a nickname for that union type. This sugar allows us to emphasize the enum-like characteristics of the union of stateless, methodless traits.

The use case for this sugar is definitely enum-like types, so I'm tempted to declare that the sugar only works if all of the data types are stateless. This feels slightly more hacky, but it's also easier to reverse: generalizing the syntax will be backwards compatible, whereas restricting it (from all data types to just stateless ones) in the future could break code. I don't anticipate that backwards compatibility will be a major problem for Effes, but I think I'll take the safer approach for now, as an exercise in language evolution if nothing else.

Tuesday, June 11, 2013

Nicknames, function resolution and edge cases

In my first post about the Effes type system, lied a bit in my last post when I said that nicknames are just typing shortcuts. They also provide function resolution rules, in case a data type satisfies assumptions with conflicting methods. For instance, let's say we had the following type and assumption:

data PoppedBalloon (color :: Color) -- or something
Balloon -> :: PoppedBalloon

In the last post, I introduced a Node[A] that satisfies the Stack[A] assumption from a couple posts ago. Now, I also want to say that a Node[A] and Nothing both satisfy the Balloon assumption. Easy enough:

Node[A] -> Balloon st pop _ = PoppedBalloon(Red)
Nothing -> Balloon st pop _ = PoppedBalloon(GhostGray)

This is obviously a strained example, but besides the fact that these implementations are useless, here's the real problem:

foo :: (Nothing | Node[A]) = optionallyGetFoo()
bar = pop foo -- what is bar's type?

In the first line, we try to get a Node[A], but the method's signature tells us we might not be able to (maybe a lookup fails, for instance). If it fails, the method will return a Nothing, and so our foo has to be typed to accept either result value. So far, so good. The question is, what's bar? We can think of foo in two ways:

It's either a Node[A], which satisfies Balloon, or it's a Nothing, which also satisfies Balloon. Either way, foo is a Balloon, and pop foo returns a PoppedBalloon
It's the union of (Nothing | Node[A]), which satisfies Stack[A], meaning that pop foo returns a (Nothing | Node[A])

I haven't yet nailed down how I want this to work, but my current thinking is that if there is ambiguity, the programmer will have to be explicit about the intent. Something like:

bar1 = pop (foo :: Stack[A]) -- bar1::Stack[A]
bar2 = pop (foo::LinkedList[A]) -- bar2::LinkedList[A]

This has a couple interesting edge cases. Firstly, what happens if a union type (such as LinkedList[A] satisfies an assumption, and one (but not both) of its composite types defines a method that conflicts. For instance, Stack[A] defines a method head. What if you also provided a function on Node[A] alone that conflicts with this? In fact, that method already exists! When we defined the Node[A] data type as:

data Node[A] = head :: A, next :: Nothing | Node[A]

the compiler automatically created two functions:

head Node[A] :: A
next Node[A] :: Nothing | Node[A]

A type foo :: Node[A] cause a conflict with Stack[A], because (as mentioned in the previous post) a Node[A] is a subtype of Nothing | Node[A], which satisfies Stack[A]. The solution here is that Effes will take the most specific function it can; in this case, the one declared to take only a Node[A], that is, the auto-generated getter.

Okay, so foo :: Node[A] means that head foo invokes the getter, while foo :: Nothing | Node[A] means that head foo invokes the Stack[A] method. But what if we additionally define a method that takes Nothing:

head (_ :: Nothing) :: Int  = -1

Now we're back to an ambiguity in the case of foo :: Nothing | Node[A]. The compiler can say that head foo invokes the Stack[A] version of head and thus return a Possible[A]; or that it will switch at runtime between two specific functions, head Node[A] :: A and head Nothing :: Int, and thus return a A | Int. Again, the rule is to favor the specific calls; the compiler will chose the second of those two options.

Finally, let's say we define a method that takes a union type:

head (a :: Nothing | Node[A]) :: (String | Float) =
    case a of
        Nothing -> "I am a fish"
        Node[A] -> 3.14

Now what does head foo return? In this case, it invokes that new method, and thus returns a type of String | Float. This seems a bit inconsistent — aren't I now picking a less specific type? Yes, but it's one that is otherwise impossible to name, so this is the only way I have to invoke it! Whereas before, I could "cast" the Nothing | Node[A] to a LinkedList[A] or even a Stack[A], here I have nothing other than Nothing | Node[A] to cast it to in order to tell the resolution engine to pick this new (and really useless) method. If I want to invoke the two specific methods, I have to manually invoke case:

barUseless = head foo -- bar :: String | Float
barSlightlyLessUseless = case foo of
    Nothing -> head foo -- :: Int
    Node[A] -> head foo -- :: A
-- barSlightlyLessUseless :: Int | A

This is admittedly awkward, but I think it's an edge case that won't come into play very often. I hope.

In summary, functions are resolved in this order:

union types
data types
nicknames
assumptions

That last one may strike you as a bit odd. After an assumption is just a contract, not a specific implementation, right? So far, yes; but in the next post, I'll talk about stateful assumptions that do have implementations.