Yuval Shavit: sugar

Showing posts with label sugar. Show all posts

Friday, May 9, 2014

Next up: generics!

I implemented open types in Effes the other day, so I’m gearing up for the next big push: generics! I was thinking of doing tuples first, but they have all of the same complexities as full-blown generics. (You can think of tuples as just sugar around predefined generic classes like Tuple3[A,B,C] — in fact, a bunch of languages do exactly that.)

Generics interact with type disjunction in interesting ways. For instance, what happens when you disjoin Box[A] and Box[B]? Is it a vanilla disjunction, or are disjunctions distributive, so that Box[A] | Box[B] becomes Box[A | B]? Both approaches have their pros and cons.

I’ll call the first one the “standard” option, and the second one the “distributive” one. I’ll illustrate withtype Maybe[A] = One[A] | Nothing, which uses type One[A](elem: A). When you disjoin Maybe[A] | Maybe[B], Effes will expand both Maybes, leading to Maybe[A] | Maybe[B] | Nothing | Nothing, which simplifies to just Maybe[A] | Maybe[B] | Nothing. And then what?

The standard option is straightforward. When you pattern match, you have to specify which of the alternatives you want, filled out completely (with the generic parameter and all). This has the chief benefit of being simple, though the syntax it suggests is a bit clunky:

case mysteryBox of
    One[A](elem): handleA elem
    One[B](elem): handleB elem
    Nothing: handleNothing

The disjunctive interpretation, on the other hand, feels really dynamic, which I like. I think one of the strengths of Effes is that it gives you the feel of dynamic typing with the protections of static typing. In this view of things, mysteryBox isn’t one of three concrete options as above; it’s one of two options, the first of which is itself fuzzy.

For instance, let’s say we’re painting a layer with transparency. A given pixel could have a color or not, and the color could be specified by RGB value or by name: Maybe[Rgb] | Maybe[ColorName]. If there’s already a method paintPixel(color: Rgb | ColorName), the distributive option works perfectly. You don’t need to specify the generic parameter in the pattern match, because it’s unamibiguous to the compiler:

case maybeColor of
    One(c): paint c -- c:(Rgb | ColorName)
    Nothing: paintTransparency

This is nice, but I think there are times when the user won’t want that flexibility; they’ll want to treat each option separately. In a differently-factored version of the above, we may want the non-distributive option, so that we can feed the color to paintRgb or paintNamed, as appropriate.

One argument in favor of the distributive option is that it can simulate the standard option pretty easily:

case maybeColor of
    One(c): case c of
        Rgb: paintRgb c
        ColorName: paintNamed c
    Nothing: paintTransparency

That looks promising, but it’s actually very limited: it breaks down when the container can hold multiple items, instead of just one. For instance, what if we want to paint a row of columns, typed as List[Rgb] | List[NamedColor]? The nested case doesn’t work naturally. At best, we can wait for lambdas, then perform an inline map on the list, but that’s more complicated than it should be.

And lastly, the distributive approach takes a huge liberty with the programmer’s semantics. A List[A] is a homogeneous list of As; a List[A] | List[B] represents either a List[A] or a List[B]. To change that to a heterogeneous list of (A | B) is a big departure from the explicitly-written code.

All of that is to say that the standard system, despite its increased verbosity and stodgy syntax, is almost definitely the right approach. But wait! We can throw a big of sugar at the problem to make the standard approach feel like the hip, distributive one!

The first problem with the syntax was that awkward combo of square brackets and parenthesis: One[A](elem). We can solve this by borrowing from our method declaration syntax, and putting the type inside the parens: One(elem: A). Feels better already.

Next, we can take that one step further. If no type is specified, then the compiler will try to rewrite the case with each of the possible patterns, using the one in the code as a template. So, this:

case mysteryBox of
    One(elem): handle elem
    Nothing: handleNothing

… is just sugar for:

case mysteryBox of
    One(elem: A): handle elem
    One(elem: B): handle elem
    Nothing: handleNothing

One of the things I like about this is that it adds to the sugar of the language without adding to the amount of sugar the programmer needs to think about, because it complements the invoke-on-disjunction sugar so nicely.

One area that’s important to keep in mind is how types with multiple generic parameters will interact with error messages. Consider this snippet:

case foo of
    Pair(o1, o2): doSomethingWith o1 [o2]
    ...

(The syntax is a bit funky, and I may change it; but that just calls doSomethingWith with two arguments, o1 and o2. You can essentially ignore the square brackets.)

Here, o1 may be of type A or B, and o2 may be C or D. But we don’t get all four combinations: if o1 is A, then o2 must be C, and if o1 is B, then o2 must be D. That’s simple enough if you write the expansion out, but if you make a mistake in your head, the error message could confuse you more than it helps. For instance, imagine if doSomethingWith takes an A and a D and you get an error message saying something like “doSomethingWith expected types [A| B, C | D] but saw [A, D].” Doesn’t that look like it’s complaining that it got good inputs? A better message would be doSomethingWith expected types [A, C] or [B, D] but saw [A, D].” Even then, I’m not sure this would be clear to someone who’s new to the language.

Tuesday, April 22, 2014

Polymorphism using disjunctive types

I haven’t updated this blog in a while, but I’ve actually been making some pretty decent progress on Effes. I’ve got basic types working, method invocation, basic pattern matching and — wait for it! — disjunctive types!

My first attempt at the Effes compiler was a bit messy. I wrote the grammar first and tried to write the compiler over it, but I found I was getting confused as to which parts were complete, which were half-done, which were fully TODO, etc. So I took what I’d learned, chucked the code (and grammar) and started from scratch. This time I worked incrementally, adding features to the grammar and compiler/interpreter in sync and one at a time.

I haven’t made any progress on conjunctive types yet, but I realized I can go a long way without them. I can even get polymorphism, with just a touch of magic. Barely any at all, really.

Let me walk you through it, starting with a pretty simple program:

data type True:
    def toFalse -> False: return False
data type Fale:
    def toTrue -> True: return True

def not (b: True | False) -> True | False:
    return case b of
        True: (b: toFalse)
        False: (b: toTrue)

There are two things going on here: “downcasting” a disjunction to a simple type within each alternative, and disjoining the result types for the case expression as a whole.

First the downcasting. In each of the alternatives (the last two lines), the compiler is able to cast b from True | False to the matched type. For instance, b in the last line is typed as False, not True | False. This means that (b: toTrue) compiles and runs fine. Next, the result type. Since True::toFalse returns False, and False::toTrue returns True, the whole expression returns False | True (which is the same as True | False). If False::toTrue had returned True | FileNotFound, the expression would return False | True | FileNotFound.

So, there’s a tad of cleverness going on, but nothing too weird. If we rename both toTrue and toFalse to negate, we get two unrelated methods with the same name. It works exactly as above, but it’s starting to look polymorphic-ish:

data type True:
    def negate -> False: return False
data type False:
    def negate -> True: return True

def not (b: True | False) -> True | False:
    return case b of
        True: (b: negate)
        False: (b: negate)

Okay, neat. But it’s still not really polymorphic, since there’s no “supertype” to speak of. There’s a bit of repetition with the negate methods, so what about this as a shortcut?

def not (b: True | False) -> True | False:
    return (b: negate)

b is still a disjunctive type in the last line, but the compiler is able to figure out that all of its alternatives have a method negate that takes zero arguments. It thus expands this invocation to the case expression as in the previous snippet. But I can’t provide a new implementation of a boolean type — that is, I can’t make FileNotFound a subclass of boolean — because b in the method arguments is typed specifically to True | False.

What would we even want a boolean type to be? A simple answer is to just make it an alias for True | False.

type Boolean = True | False

The left-hand side defines a type name, and the right-hand side defines a target type (simple type or disjunction). There’s no extra type checking; in this example, the compiler just replaces every instance of Boolean with True | False, as if you had written True | False out. The following methods are identical in every aspect except their names:

f1 (b: Boolean) -> True | False: return b
f2 (b: True | False) -> Boolean: return b

Extending aliases a bit, we can define an “open alias” which is just an alias to which other types can add themselves as disjunctive alternatives. Open aliases also declare methods, which all of their alternatives must also declare (and implement).

-- declare an open alias, Boolean
open type Boolean:
    def negate -> Boolean

-- declare True and False, which each declare a "negate" method
data type True:
    def negate -> True: return False
data type False:
    def negate -> False: return True

-- Add True and False as disjunctive alternatives to Boolean
Boolean |= True
Boolean |= False

Note that True::negate and False::negate are totally unrelated methods. Neither relates to Boolean::negate, because there really such a method. All there is is a requirement that any type that adds itself to the Boolean alias also declare a method named negate that takes no arguments and returns a type that’s contained within Boolean (for instance, True is contained within True | False). For that matter, Boolean itself doesn’t really exist: it’s just shorthand for True | False.

When we put all of the above together, we get polymorphism! To illustrate, I’ll start with the finished program and show how it gets rewritten:

open type Boolean:
    def negate -> Boolean

data type True:
    def negate -> False: return False
Boolean |= True

data type False:
    def negate -> True: return True
Boolean |= False

def not (b: Boolean) -> Boolean:
    return (b: negate)

First, let’s rewrite the open alias as a standard alias:

type Boolean = True | False

data type True:
    def negate -> False: return False
data type False:
    def negate -> False: return True

def not (b: Boolean) -> Boolean:
    return (b: negate)

Next, expand the Boolean alias:

data type True:
    def negate -> False: return False
data type False:
    def negate -> True: return True

def not (b: True | False) -> True | False:
    return (b: negate)

And finally, rewrite (b: negate) as a case expression.

data type True:
    def negate -> False: return False
data type False:
    def negate -> True: return True

def not (b: True | False) -> True | False:
    return case b of
        True: (b: negate)  -- True::negate
        False: (b: negate) -- False::negate

Effes programs don’t have dynamic linking yet, so the translation is really that literal. If and when I do implement dynamically linked programs, obviously that won’t work; the last step, to translate b: negate, will have to do something vtable-like.

Monday, March 17, 2014

Inheritance is dead, long live composition

One aspect of the type system that’s always left me unsatisfied is its asymmetry against traditional object-oriented languages. Most OO languages formally recognize inheritance within the type system, but not composition. Given that Effes formally recognizes composition, shouldn’t it not recognize inheritance?

This is important to me for more than just aesthetic reasons. Recognizing both patterns makes for a more complicated type system, but worse, it gives the programmer a too-easy crutch. One of the reasons I turned to Haskell when I was interested in learning about functional programming was that I wanted to force myself to really start thinking in FP terms. If I were learning on a language like Scala, which combines OO and FP patterns, it’d be too easy to fall back on familiar ways of looking at a problem.

In the same way, I want Effes to force me into thinking with a composition-based perspective, rather than letting me have another inheritance-based language with a shot of composition flavoring.

The hurdle, though, has been polymorphism. It’s useful to have a method that takes Sizeable objects, whether they’re List, Map or anything else that’s Sizeable. It’s also nice to have that size method on both List and Map.

My solution is to replace “List is-a Sizeable” with “List has-a Size component:”

type List[A]:
    has Size
    add A -> List[A]
    -- etc...

For a user of List to get to the size method, they’ll need to access its Size component, which can be done explicitly with (list @ Size) size. But, if the Size component doesn’t conflict with any other of List’s components, you can implicitly access it: list size. And similarly, if a method takes a Size argument, you can explicitly give it the list’s Size component by calling theMethod (list @ Size), but you can also just call theMethod list, and the compiler will figure out that you want to pass it the Size component.

A nice side benefit of all this is that it provides a nicer answer to the question of conflicting components, which I addressed in earlier posts. Rather than handling conflicts at composition time by knocking out some components, I’ll allow the conflict there, and force the user into stating which component they want, when there’s a conflict. So for instance, if List and Set both have an add method, you can’t write listAndSet add foo. You have to explicitly call out the component you want: (listAndSet @ List[Foo]) add foo.

There are two syntax details I have yet to work out with this all-composition scheme.

The first involves cleaning up the code when a type has only one component: ConsList[A] “implements” List[A], for instance. Everything is fine from a useage perspective, but it’s a bit awkward to write out:

type ConsList[A]:
    has List[A]:
        -- all of the ConsList code goes here

So, I’m thinking of allowing a special “is-a” statement for this situation, which just lets you inline the second line in the above:

type ConsList[A] is List[A]:
    -- all of the ConsList code goes here

The second is in cleaning up implementations of nested types. Remember how List had a Size component above? Does that mean we have to implement it as:

type ConsList[A] is List[A]:
    add elem: ...
    Size:
        size: ...

or can we just write:

type ConsList[A] is List[A]:
    add elem: ...
    size: ...

My inclination here is to mirror the call site rules: you can inline the method definitions for a given component if that component doesn’t conflict with any other components. That keeps things simple, consistent and clean.

Thursday, February 20, 2014

Method invocation syntax

I’ve been giving some thought to method invocation lately, trying to come up with something that’s fluent in simple cases, and familiar (to programmers) for the more complex cases. After a bit of playing around, I think I have a system I like.

Consider Java’s BigDecimal, and specifically its divide method. It feels very programmer-y:

aNum = myNum.divide(someOtherNum)

Wouldn’t it be nice if we could make this feel more natural?

aNum = myNum dividedBy someOtherNum

That suggests a grammar of object methodName arg0[, arg1, arg2...]. But if you have more than a couple args, this gets a bit muddy; the words all clump together in my brain, and it’s not entirely clear what’s what anymore: foo doBar baz, apple, banana, coconut. If anything, it looks like the logical grouping is (foo doBar baz) (apple banana coconut). Of course, it isn’t, and my brain knows that… but it’s not intuitive to my eye.

As I was looking around at various methods, I noticed another interesting thing: very often for multi-arg methods, there’s one “main” argument that’s followed by “secondary” arguments. In human-language grammar terms, there’s a single direct object, and then some adjectives and adverbs.

BigDecimal.divide(BigDecimal, RoundingMode) is a good example: the first argument is what you’re dividing by, and the second is just some almost-parenthetical info on how to do the division. It feels like this:

aNum = myNum dividedBy someOtherNum: HalfUp

This suggests a grammar of object methodName arg0 [: arg1, arg2, arg3...]. And that’s in fact what I think I’m going to go with (with a slight tweak that I’ll get to in a bit).

There’s an obvious problem, which is that not all methods follow that semantic pattern. For instance, List.sublist takes two arguments, fromIndex and toIndex. Neither modifies the other; they’re both “primary” args. (This may have been different if the arguments were fromIndex and length, but they’re not). You really do want to invoke this using the parentheses we all know and love:

aList = myList sublist (3, 7)

Yikes — does that mean I need two ways to invoke methods? Worse yet, do I let the call sites determine which to use, so that sometimes I’ll see myList sublist 3: 7 and sometimes I’ll see myNum dividedBy (someOtherNum, HALF_UP)? The latter isn’t bad, but I don’t want my language to encourage inconsistent style on things like this. So maybe I want to let the method declaration define which syntax to use… but how?

The solution is actually pretty simple: methods like sublist take only one arg, but it’s a tuple! That’s not enforced by the language, of course, but the syntax for declaring methods should mirror the syntax for calling them, so that things will naturally work out.

The one big issue with that grammar is that the : char is already used in lots of places, and in particular as a way of declaring a variable’s type (including to upcast it). For instance, myNum divided by someOtherNum : SomeType is ambiguous; does it take one arg, someOtherNum : SomeType, or does it take two args, someOtherNum and SomeType?

To solve this, I’m going to make a slight aesthetic concession and replace the : with {...} in method invocation.

aNum = myNum dividedBy someOtherNum { HalfUp } -- two args, num and mode
aList = myList sublist (3, 7)        -- one arg, a tuple of (start, end)

As I mentioned above, the method declaration should mirror invocation. Something like:

dividedBy divisor:BigDecimal { mode: RoundingMode } -> BigDecimal: ...
aList (start: Int, end: Int) -> List[E]: ...

I like this approach a lot, except for the curly braces. Ideally I’d use a colon, or even a pipe, but all of the single-char approaches I could think of would either cause ambiguity or be ugly. For instance, a pipe would be fine at the call site, but create visual ambiguities at declaration:

dividedBy divisor: BigDecimal | mode: RoundingMode -> BigDecimal: ...

That pipe looks like a disjunctive type at a glance. This isn’t an ambiguity from the grammar’s perspective, since mode is clearly a variable name and not a type (Effes enforces the capitalization scheme that Java suggests), but it’s not nice on the eyes. Some optional parentheses would help, but it’s hard to get excited about that. So for now, curly braces are it.

The thing I like about this syntax is that with one rule, I get everything I want. Simple methods look fluent; methods with adverbs look good (if a tad clunky with the braces); and in the worst case, I get something that’s no worse than what most of the popular languages out there require or recommend.

Thursday, February 13, 2014

Statements and expressions: an exploration of ambiguity

I've been working on the parser for Effes a bit, and I got a bit stuck on an ambiguity in case constructs; I want them to work as either statements or expressions.

To anchor things a bit, here are two uses of case, one of which is used as an expression, and the other as a statement:

-- as an expression
firstInt = case ints of
    (): 0
    (head, tail): head

-- as a statement
case ints of
    (): print "empty list!"
    (_): print "list has one element"
    _: print "list has #{intsList size} elements"

Languages handle this in various ways that make things simple. For instance:

In Java, it's always unambiguous whether something is expected to be a statement or an expression.
In Haskell, each function is just a single (potentially complex) expression; there are no statements, and thus no ambiguity
In Scala, you can put an expression anywhere in a function body, and the last expression is the function's return value — so again there's no ambiguity, because you can just make case constructs (match as they're called in Scala) always be expressions.

Scala's approach works, but it also lets you define a function as def g() = { 1; 2; 3; }, which I don't like. Statements and expressions are different beasts to me, and conflating them seems like a lazy and inelegant solution.

So then, is the case in that f example above about to introduce a statement or expression?

One solution is to take a hint from Java and have method bodies always consist of statements. If we take that approach, f... : case ints of is a statement. To make it be an expression whose value is returned, we'd have to write f... : return case ints of....

That's not the end of the world. In fact, I've never liked the Ruby-style return statements, where you just plop an expression at the end of a method:

def ugly
    123
end

There are a few reasons I don't like this, but the main reason is that in an imperative context (which a Ruby/Scala/etc method is), returning a value is an action. It should look like one! When I write imperative code, I'm telling the computer a series of actions to take. An implicit return feels like this to me:

First, ask the user how many apples they want.
Then find out how many apples are available.
Then, the minimum of that number and the number of apples requested.

That last sentence feels wrong, because it's not a sentence; it's a phrase. You can figure out what it means, but it feels stilted.

On the other hand, when writing one-liners, the return feels superfluous. Here's a nice size function for a list:

size -> Int: case this of
    (): 0
    _: 1 + (list tail)

One option I'm considering is to look at the return value if the method is a one-liner (that is, just a single statement or expression — even if it's complex). If it's Unit, that one line is a statement; otherwise, it's an expression. (If the function's body is a block instead of a one-liner, that block consists entirely of statements, including possibly a return statement.)

This feels a bit subtle and potentially confusing, and maybe that should be a big warning. On the other hand, I think that for most cases, it'll "just work." Crucially, since this only applies to one-liners, nearly all the cases should hopefully be simple cases. I can't think of any that wouldn't be.

This approach also means that the compiler will have to know about the Unit type specially. My instincts are that this smells wrong, but maybe it's not so bad.

Ah, what the heck. Despite all these warning bells going off, I'll try it out. If nothing else, it'll be good to see if my intuition (that this is a sketchy idea) is right, and why specifically. As Batman Begins put it, we fall so we can learn to pick ourselves up.

Thursday, July 11, 2013

CoffeeScript should handle callbacks better

I want to add a quick addendum to yesterday's post about best practices. I mentioned the staircase problem caused by Node's reliance on callbacks: if one action is a prerequisite for other actions in a method (for instance, you query a database and then act on those results), the rest of that method ends up indented.

Node has another big problem, which is that its target language, JavaScript, is awful. Luckily, one of Node's third-party modules, CoffeeScript, provides a decent language that compiles down into JavaScript. We get all the Node goodness without the CoffeeScript badness!

Since CoffeeScript is a new language that can evolve quickly, and since one of its main use cases is Node, and since Node relies heavily on callbacks... why not add some sugar to make callbacks a bit nicer? I propose a way to bind callback arguments to left-hand variables. This is actually pretty similar to what Haskell does with its do notation, and for pretty similar reasons.

Let's take a simple, imperative-with-exceptions snippet of code:

try
  res1 = func1 arg1
  [res2a, res2b] = func2 res1
  if res2a is "foo"
    doFoo()
  else
    res3 = func3 res2b
    doBar res3
catch err
  handle err

That's pretty simple. Watch how gross it turns when we use callbacks instead of just returning back the results:

func1 arg1, (err, res1) ->
  if err?
    handle err
  else
    func2 res1, (err, res2a, res2b) ->
      if err?
        handle err
      else if res2a is "foo"
        doFoo()
      else
        func3 res2b (err, res3) ->
          if err?
            handle err
          else
            doBar res3

My suggestion is to create some sugar for that. It would look something like this:

do and throw err...
  (err, res1) <- func1 arg1
  (err, res2a, res2b) <- func2 res1
  if res2a is "foo"
    doFoo()
  else do...
    (err, res3) <- func3 res2b
    doBar res3
  catch err
    handle err

Notice how similar this is to the original, easy-to-read, imperative style. The general idea is simple: the new do... syntax introduces a block of code in which callback variables can be bound on the left-hand side. Every time that happens, it starts a new callback nested in the previous one. If you provide the and throw varname syntax, then it treats left-hand bound variables of this name as errors, and if one ends up being non-null, its callback will run the code in the catch block and nothing else.

I won't pretend this is a small bit of sugar; it probably has some interesting edge cases, and the concept might be a bit weird to grok for someone who's new to it. But it's an elegant solution to a real problem that's pretty significant for a major part of CoffeeScript's target audience.

Thursday, June 27, 2013

Creating composite objects

Up until now, I've talked mostly about types. In this post, I'm going to take a slight detour and talk about objects. This won't be the most jaw-dropping of posts, but it will be helpful for the discussion on subtypes, which is necessary for talking about function resolution.

I'll start by knocking out two really easy cases: uncomposed types and union types.

data Foo = Foo(foo : Int)
data Bar

a = Foo(123) -- uncomposed type
b : Foo | Bar = Foo(456)

In this example, a has an inferred compile-time type of Foo, and the code creates a new Foo object whose foo value is 123. Yawn. Then, b has an explicit compile-time type of Foo | Bar, and the code again creates a Foo object. Yawn again. But all of a sudden...

c : Foo Bar = Foo(789)

Ah, this is interesting. The object being created is a Foo, and yet it's being assigned to a Foo Bar, which is essentially a subtype of Foo! This would be like a snippet of Java code reading Car c = new Vehicle() (if Car is a subclass of Vehicle). That's not allowed in Java, so why would it be in Effes?

What's really going on in that example is this:

fTmp : Foo = Foo(789)
bTmp : Bar = Bar()
c : Foo Bar = fTmp bTmp

Just as a conjunctive composed type is created by just writing its two component types side-by-side, a composed object is created by writing its two component objects side-by-side. Simple as that!

The original syntax of the c assignment was actually sugar. If the right-hand side of an assignment is of type A, and the left-hand side is of type A B, and B is a type which doesn't require any explicitly-provided state, then an object of type B is assumed. That is, the original c : Foo Bar = Foo(789) was automatically expanded to c : Foo Bar = Foo(789) Bar() because the Bar type doesn't require any explicitly-provided state.

What happens if you do need to add state? You have two options, both analogous to the syntax for constructing uncomposed objects. You can use the parenthetical constructor syntax for each uncomposed object, and just list the objects side-by-side (this is the syntax I've been using in this post so far). You can also use where syntax, listing all of the fields you want to set in the indented field-assignment block. You can prefix any field name with its type to qualify it; this is optional in most cases, but required when field names collide.

data Foo = Foo(foo : Int, fizz : String)
data Bar(bar : Int, fizz : Int)

fooBar1 = Foo(123, "Hello") Bar(456, 789)
fooBar2 = Foo Bar where
    foo = 123 -- could have been 'Foo foo'
    Foo fizz = "Hello"
    bar = 456 -- could have been 'Bar bar'
    Bar fizz = 789

Note that the order of these fields doesn't matter — you can interleave them, whatever. Each field's name unambiguously identifies it, and they all belong to the single type Foo Bar, so there's nothing special about any particular order. That said, it's probably good form to group fields by component type.

I'm considering adding an additional syntax, which really treats Foo Bar as the new, separate type it is:

fooBar3 = (Foo Bar)(foo=123, Foo fizz="Hello",
                    bar=456, Bar fizz=789)

This has a certain symmetric elegance to it, but it's kinda ugly and potentially confusing. I was going to disallow it, but I think I have to let it in because of nicknames. While the example above is ugly, this starts to make sense:

nickname FooBar = Foo Bar
fooBar4 = FooBar(foo=123, Foo fizz="Hello",
                 bar=456, Bar fizz=789)

I suppose I could allow that syntax only if the composite type has been nicknamed (and is being constructed as such), but this feels like it would complicate the language for not much gain. Better to say that the (Foo Bar)(...) syntax is allowed but discouraged. At least, I hope that's better.

Tuesday, June 25, 2013

Sugar and the "data" keyword

In my last post, I discussed the data keyword as sugar for stateful, struct-like traits. This is mostly pure sugar, although it's also the only way to define a stateless, methodless trait (like Nothing). I'd like to justify those decisions, and also introduce two more pieces of sugar.

As I touched on in an earlier post, there is a balance to be had between deep abstractions vs. ease of use. Syntactic sugar, if designed correctly, can help that balance by highlighting certain aspects of an abstraction. This is useful both for the language design, which can now have a high-level abstraction whose concrete implications are clearer; and from an individual program's design, where it's more obvious which part of the abstraction is important for a given type or object.

The data syntax aims to do this by recognizing that a type whose only functions are getters feels different than one with abstract virtual methods. The first is used purely to store state, whereas the latter doesn't even care about state directly — just about behavior. Effes combines both of concepts into traits, so sugar can help clarify which aspect is more important for a a given type. When you want to focus exclusively on the state-storing ability of a trait, data is a better option, and when you need the behavior-defining aspect of a trait, the default syntax is a better (well, only) option.

As for requiring data syntax for stateless, methodless traits, this is partly pragmatics and partially a philosophical consideration. Given that the syntax for an abstract trait is something like this:

Sizeable:
  size : Int

... how would Nothing look with this syntax? It'd probably be something like one of these:

Nothing:
-- rest of your code here, unindented

Nothing -- no colon
-- rest of your code here, unindented

Nothing:
  pass

The first two feel like weird, dangling tokens; I don't like them from an aesthetic perspective. The last one borrows from Python's pass statement, which inserts a runtime no-op. It was designed for exactly the kind of situations Nothing would have using default trait syntax: the language requires something, but you want nothing. I've always thought this was a bit ugly. Within the context of a function, one could just use return instead; and within the context of a class definition, pass suggests that something weird is going on — a methodless, init-less, stateless class is a weird beast. In Effes, such a thing is indeed useful, but it feels very much like a data type; so, rather than introducing new syntax for it, or generalizing the default syntax to allow weird, ambiguous-looking constructs (like the first or second examples above), I just require the data syntax, which does the job just fine.

Finally, as promised, here are two more minor pieces of sugar for the data syntax. Firstly, you can define multiple types on one line, separated by a semicolon. And secondly, you can follow a data definition with "nicknamed foo" to automatically create a nickname for the union of all of the types defined on that data line. So this:

data LT
data GT
data EQ
nickname Comparison = LT | GT | EQ

...can be expressed more succinctly as:

data LT; GT; EQ nicknamed Comparison

Again, this tries to smooth the transition between abstractions and ease of use. One common use case for a type system is to define enum types; a comparison is either less than, greater than, or equal to. In a language like Haskell, these are different constructors of the same type. In Effes, they would be defined as the union type of the three traits, and one would almost definitely want a nickname for that union type. This sugar allows us to emphasize the enum-like characteristics of the union of stateless, methodless traits.

The use case for this sugar is definitely enum-like types, so I'm tempted to declare that the sugar only works if all of the data types are stateless. This feels slightly more hacky, but it's also easier to reverse: generalizing the syntax will be backwards compatible, whereas restricting it (from all data types to just stateless ones) in the future could break code. I don't anticipate that backwards compatibility will be a major problem for Effes, but I think I'll take the safer approach for now, as an exercise in language evolution if nothing else.