Showing posts with label polymorphism. Show all posts

Friday, May 9, 2014

Next up: generics!

I implemented open types in Effes the other day, so I’m gearing up for the next big push: generics! I was thinking of doing tuples first, but they have all of the same complexities as full-blown generics. (You can think of tuples as just sugar around predefined generic classes like Tuple3[A,B,C] — in fact, a bunch of languages do exactly that.)

Generics interact with type disjunction in interesting ways. For instance, what happens when you disjoin Box[A] and Box[B]? Is it a vanilla disjunction, or are disjunctions distributive, so that Box[A] | Box[B] becomes Box[A | B]? Both approaches have their pros and cons.

I’ll call the first one the “standard” option, and the second one the “distributive” one. I’ll illustrate withtype Maybe[A] = One[A] | Nothing, which uses type One[A](elem: A). When you disjoin Maybe[A] | Maybe[B], Effes will expand both Maybes, leading to Maybe[A] | Maybe[B] | Nothing | Nothing, which simplifies to just Maybe[A] | Maybe[B] | Nothing. And then what?

The standard option is straightforward. When you pattern match, you have to specify which of the alternatives you want, filled out completely (with the generic parameter and all). This has the chief benefit of being simple, though the syntax it suggests is a bit clunky:

case mysteryBox of
    One[A](elem): handleA elem
    One[B](elem): handleB elem
    Nothing: handleNothing

The disjunctive interpretation, on the other hand, feels really dynamic, which I like. I think one of the strengths of Effes is that it gives you the feel of dynamic typing with the protections of static typing. In this view of things, mysteryBox isn’t one of three concrete options as above; it’s one of two options, the first of which is itself fuzzy.

For instance, let’s say we’re painting a layer with transparency. A given pixel could have a color or not, and the color could be specified by RGB value or by name: Maybe[Rgb] | Maybe[ColorName]. If there’s already a method paintPixel(color: Rgb | ColorName), the distributive option works perfectly. You don’t need to specify the generic parameter in the pattern match, because it’s unamibiguous to the compiler:

case maybeColor of
    One(c): paint c -- c:(Rgb | ColorName)
    Nothing: paintTransparency

This is nice, but I think there are times when the user won’t want that flexibility; they’ll want to treat each option separately. In a differently-factored version of the above, we may want the non-distributive option, so that we can feed the color to paintRgb or paintNamed, as appropriate.

One argument in favor of the distributive option is that it can simulate the standard option pretty easily:

case maybeColor of
    One(c): case c of
        Rgb: paintRgb c
        ColorName: paintNamed c
    Nothing: paintTransparency

That looks promising, but it’s actually very limited: it breaks down when the container can hold multiple items, instead of just one. For instance, what if we want to paint a row of columns, typed as List[Rgb] | List[NamedColor]? The nested case doesn’t work naturally. At best, we can wait for lambdas, then perform an inline map on the list, but that’s more complicated than it should be.

And lastly, the distributive approach takes a huge liberty with the programmer’s semantics. A List[A] is a homogeneous list of As; a List[A] | List[B] represents either a List[A] or a List[B]. To change that to a heterogeneous list of (A | B) is a big departure from the explicitly-written code.

All of that is to say that the standard system, despite its increased verbosity and stodgy syntax, is almost definitely the right approach. But wait! We can throw a big of sugar at the problem to make the standard approach feel like the hip, distributive one!

The first problem with the syntax was that awkward combo of square brackets and parenthesis: One[A](elem). We can solve this by borrowing from our method declaration syntax, and putting the type inside the parens: One(elem: A). Feels better already.

Next, we can take that one step further. If no type is specified, then the compiler will try to rewrite the case with each of the possible patterns, using the one in the code as a template. So, this:

case mysteryBox of
    One(elem): handle elem
    Nothing: handleNothing

… is just sugar for:

case mysteryBox of
    One(elem: A): handle elem
    One(elem: B): handle elem
    Nothing: handleNothing

One of the things I like about this is that it adds to the sugar of the language without adding to the amount of sugar the programmer needs to think about, because it complements the invoke-on-disjunction sugar so nicely.

One area that’s important to keep in mind is how types with multiple generic parameters will interact with error messages. Consider this snippet:

case foo of
    Pair(o1, o2): doSomethingWith o1 [o2]
    ...

(The syntax is a bit funky, and I may change it; but that just calls doSomethingWith with two arguments, o1 and o2. You can essentially ignore the square brackets.)

Here, o1 may be of type A or B, and o2 may be C or D. But we don’t get all four combinations: if o1 is A, then o2 must be C, and if o1 is B, then o2 must be D. That’s simple enough if you write the expansion out, but if you make a mistake in your head, the error message could confuse you more than it helps. For instance, imagine if doSomethingWith takes an A and a D and you get an error message saying something like “doSomethingWith expected types [A| B, C | D] but saw [A, D].” Doesn’t that look like it’s complaining that it got good inputs? A better message would be doSomethingWith expected types [A, C] or [B, D] but saw [A, D].” Even then, I’m not sure this would be clear to someone who’s new to the language.

Tuesday, April 22, 2014

Polymorphism using disjunctive types

I haven’t updated this blog in a while, but I’ve actually been making some pretty decent progress on Effes. I’ve got basic types working, method invocation, basic pattern matching and — wait for it! — disjunctive types!

My first attempt at the Effes compiler was a bit messy. I wrote the grammar first and tried to write the compiler over it, but I found I was getting confused as to which parts were complete, which were half-done, which were fully TODO, etc. So I took what I’d learned, chucked the code (and grammar) and started from scratch. This time I worked incrementally, adding features to the grammar and compiler/interpreter in sync and one at a time.

I haven’t made any progress on conjunctive types yet, but I realized I can go a long way without them. I can even get polymorphism, with just a touch of magic. Barely any at all, really.

Let me walk you through it, starting with a pretty simple program:

data type True:
    def toFalse -> False: return False
data type Fale:
    def toTrue -> True: return True

def not (b: True | False) -> True | False:
    return case b of
        True: (b: toFalse)
        False: (b: toTrue)

There are two things going on here: “downcasting” a disjunction to a simple type within each alternative, and disjoining the result types for the case expression as a whole.

First the downcasting. In each of the alternatives (the last two lines), the compiler is able to cast b from True | False to the matched type. For instance, b in the last line is typed as False, not True | False. This means that (b: toTrue) compiles and runs fine. Next, the result type. Since True::toFalse returns False, and False::toTrue returns True, the whole expression returns False | True (which is the same as True | False). If False::toTrue had returned True | FileNotFound, the expression would return False | True | FileNotFound.

So, there’s a tad of cleverness going on, but nothing too weird. If we rename both toTrue and toFalse to negate, we get two unrelated methods with the same name. It works exactly as above, but it’s starting to look polymorphic-ish:

data type True:
    def negate -> False: return False
data type False:
    def negate -> True: return True

def not (b: True | False) -> True | False:
    return case b of
        True: (b: negate)
        False: (b: negate)

Okay, neat. But it’s still not really polymorphic, since there’s no “supertype” to speak of. There’s a bit of repetition with the negate methods, so what about this as a shortcut?

def not (b: True | False) -> True | False:
    return (b: negate)

b is still a disjunctive type in the last line, but the compiler is able to figure out that all of its alternatives have a method negate that takes zero arguments. It thus expands this invocation to the case expression as in the previous snippet. But I can’t provide a new implementation of a boolean type — that is, I can’t make FileNotFound a subclass of boolean — because b in the method arguments is typed specifically to True | False.

What would we even want a boolean type to be? A simple answer is to just make it an alias for True | False.

type Boolean = True | False

The left-hand side defines a type name, and the right-hand side defines a target type (simple type or disjunction). There’s no extra type checking; in this example, the compiler just replaces every instance of Boolean with True | False, as if you had written True | False out. The following methods are identical in every aspect except their names:

f1 (b: Boolean) -> True | False: return b
f2 (b: True | False) -> Boolean: return b

Extending aliases a bit, we can define an “open alias” which is just an alias to which other types can add themselves as disjunctive alternatives. Open aliases also declare methods, which all of their alternatives must also declare (and implement).

-- declare an open alias, Boolean
open type Boolean:
    def negate -> Boolean

-- declare True and False, which each declare a "negate" method
data type True:
    def negate -> True: return False
data type False:
    def negate -> False: return True

-- Add True and False as disjunctive alternatives to Boolean
Boolean |= True
Boolean |= False

Note that True::negate and False::negate are totally unrelated methods. Neither relates to Boolean::negate, because there really such a method. All there is is a requirement that any type that adds itself to the Boolean alias also declare a method named negate that takes no arguments and returns a type that’s contained within Boolean (for instance, True is contained within True | False). For that matter, Boolean itself doesn’t really exist: it’s just shorthand for True | False.

When we put all of the above together, we get polymorphism! To illustrate, I’ll start with the finished program and show how it gets rewritten:

open type Boolean:
    def negate -> Boolean

data type True:
    def negate -> False: return False
Boolean |= True

data type False:
    def negate -> True: return True
Boolean |= False

def not (b: Boolean) -> Boolean:
    return (b: negate)

First, let’s rewrite the open alias as a standard alias:

type Boolean = True | False

data type True:
    def negate -> False: return False
data type False:
    def negate -> False: return True

def not (b: Boolean) -> Boolean:
    return (b: negate)

Next, expand the Boolean alias:

data type True:
    def negate -> False: return False
data type False:
    def negate -> True: return True

def not (b: True | False) -> True | False:
    return (b: negate)

And finally, rewrite (b: negate) as a case expression.

data type True:
    def negate -> False: return False
data type False:
    def negate -> True: return True

def not (b: True | False) -> True | False:
    return case b of
        True: (b: negate)  -- True::negate
        False: (b: negate) -- False::negate

Effes programs don’t have dynamic linking yet, so the translation is really that literal. If and when I do implement dynamically linked programs, obviously that won’t work; the last step, to translate b: negate, will have to do something vtable-like.

Monday, March 17, 2014

Inheritance is dead, long live composition

One aspect of the type system that’s always left me unsatisfied is its asymmetry against traditional object-oriented languages. Most OO languages formally recognize inheritance within the type system, but not composition. Given that Effes formally recognizes composition, shouldn’t it not recognize inheritance?

This is important to me for more than just aesthetic reasons. Recognizing both patterns makes for a more complicated type system, but worse, it gives the programmer a too-easy crutch. One of the reasons I turned to Haskell when I was interested in learning about functional programming was that I wanted to force myself to really start thinking in FP terms. If I were learning on a language like Scala, which combines OO and FP patterns, it’d be too easy to fall back on familiar ways of looking at a problem.

In the same way, I want Effes to force me into thinking with a composition-based perspective, rather than letting me have another inheritance-based language with a shot of composition flavoring.

The hurdle, though, has been polymorphism. It’s useful to have a method that takes Sizeable objects, whether they’re List, Map or anything else that’s Sizeable. It’s also nice to have that size method on both List and Map.

My solution is to replace “List is-a Sizeable” with “List has-a Size component:”

type List[A]:
    has Size
    add A -> List[A]
    -- etc...

For a user of List to get to the size method, they’ll need to access its Size component, which can be done explicitly with (list @ Size) size. But, if the Size component doesn’t conflict with any other of List’s components, you can implicitly access it: list size. And similarly, if a method takes a Size argument, you can explicitly give it the list’s Size component by calling theMethod (list @ Size), but you can also just call theMethod list, and the compiler will figure out that you want to pass it the Size component.

A nice side benefit of all this is that it provides a nicer answer to the question of conflicting components, which I addressed in earlier posts. Rather than handling conflicts at composition time by knocking out some components, I’ll allow the conflict there, and force the user into stating which component they want, when there’s a conflict. So for instance, if List and Set both have an add method, you can’t write listAndSet add foo. You have to explicitly call out the component you want: (listAndSet @ List[Foo]) add foo.

There are two syntax details I have yet to work out with this all-composition scheme.

The first involves cleaning up the code when a type has only one component: ConsList[A] “implements” List[A], for instance. Everything is fine from a useage perspective, but it’s a bit awkward to write out:

type ConsList[A]:
    has List[A]:
        -- all of the ConsList code goes here

So, I’m thinking of allowing a special “is-a” statement for this situation, which just lets you inline the second line in the above:

type ConsList[A] is List[A]:
    -- all of the ConsList code goes here

The second is in cleaning up implementations of nested types. Remember how List had a Size component above? Does that mean we have to implement it as:

type ConsList[A] is List[A]:
    add elem: ...
    Size:
        size: ...

or can we just write:

type ConsList[A] is List[A]:
    add elem: ...
    size: ...

My inclination here is to mirror the call site rules: you can inline the method definitions for a given component if that component doesn’t conflict with any other components. That keeps things simple, consistent and clean.

Sunday, January 12, 2014

A solution to the object-equality problem

As I mentioned in my last post, there are certain methods that require transitivity — and that means both objects must agree on a single implementation of those methods. For instance, if a Ball expects to compare itself against another Ball using only its radius, a WeightedBall cannot use radius and weight, even if it's being compared to another WeightedBall. If it did, transitivity could break when a plain, unweighted Ball joins the party.

My solution in Effes is to define compiletime methods, which look like normal instance methods but are in fact similar to Comparator objects whose instances are picked at compile time. This is basically an implementation of the strategy pattern.

To see this in action, consider the following snippet:

type Eq:
    [compiletime] == (other: This) -> Bool

data type Ball(radius: Int):
    is Eq
    [override compiletime] (== other):
        radius == other.radius

type WeightedBall:
    is Ball -- also inherits Eq
    weight -> Int:
    [override compiletime] (== other):
        radius == other.radius && weight == other.weight

    ByRadius:
        [override compiletime] (== other): (this: Ball) == other

First we define an Eq type, with a compiletime method named ==. This method takes one argument, which must be of the same type as the declaring class (the This class is a reserved class name, just as this is a reserved variable name in many OO languages, including Effes). The This restriction is re-applied in each specific type, so Ball == must take a Ball argument (not just a Eq).

The WeightedBall subtype of Ball redefines == to include the two balls' weights. But it also defines a "variant" called ByRadius, which only compares radii. It does this by casting itself down to a plain Ball, and then calling ==, such that this call to == has a most specific implementation of (Ball ==).

To someone trying to compare two objects, the == method looks like a normal method call. But under the hood, Effes delegates it to a strategy object. Here's a simple set implementation which maintains a list of elements and just checks each new element as it comes in, to make sure it's not already in the list:

type ListSet[T:Eq]:
  val values : List[T] = Empty
  add elem:T -> ListSet[T]: if values contains T
    then this
    else values' = values push elem
  contains (needle : T): case of:
    Nothing: False
    (head, tail): case head of:
      needle: True
      _: tail contains elem

The == magic happens in the last three lines. The case head of statement tries to match head against needle, using ==. If they match, the answer is True; otherwise, we try recursively on the tail list.

As you can see, contains takes one argument, the needle we're looking for. But under the hood, it actually takes two arguments: the needle, and an "equality strategy" that holds some implementation of ==. This object is picked at compile time, ensuring that runtime polymorphism doesn't screw up transitivity.

For instance, let's try to add a heavyBall : WeightedBall to two ListSets, one typed to Ball and the other to WeightedBall:

plainSet : ListSet[Ball] = getSomeListSet
weightedSet : ListSet[WeightedBall] = getSomeListSet
s1 = plainSet add heavyBall
s2 = weightedSet add heavyBall

Note that both variables are initially assigned to identical sets, meaning that both have all WeightedBall elements. But, plainSet is typed to ListSet[Ball]. The last two lines are internally translated to something like:

s1_ = plainSet add heavyBall (Ball ==)
s2_ = weightedSet add heavyBall (WeightedBall ==)

At the == call site (in ListSet contains), the case head of ... needle snippet is translated to something like if (_equalityTester head needle), where _equalityTester is the strategy object.

If we wanted the set to contain WeightedBalls, but only restrict to unique radii, we would write:

uniqueRadii : ListSet[WeightedBall:ByRadius] = getSomeListSet
s3 = uniqueRadii add heavyBall

The WeightedBall:ByRadius bit tells the compiler to pick the ByRadius variant we defined above:

s3_ = uniqueRadii add heavyBall (WeightedBall:ByRadius ==)

There's one important bit to note here. Comparing two objects by radius is going to give you a different result than comparing them by radius and weight; there could be more duplicates. So when we cast the result of getSomeListSet to ListSet[Ball], that's not the relatively light operation that a cast usually is. We have to actually rebuild the ListSet so that it knocks out those extra duplicates. I'm still working on the mechanism for this; maybe the casting is only allowed if the type defines a method for it, something like:

type ListSet[T:Eq]:
    @cast ListSet[T2]: ListSet[T2]() addAll values

The signatures of these compiletime methods are a bit deceitful. They look like instance methods, and even define this but they're actually just normal two-arg methods. This is of course the same for all instance methods (this is always just an implicit argument), but it feels like more of a lie here. I think it's because the this argument isn't even semantically special; the use case for compiletime methods is precisely that this and other are on the same footing!

The reason I went for the cheat is simply that it feels more natural to me, even if it's a bit inaccurate. It lets us write things like a == b (instead of == a b, as would be the case for a non-instance method) without tricky rules for declaring methods as inline. My goal from the beginning was to try to get the best of both worlds that Java offers: the correctness of Comparators with the natural feel of this.equals. I think this approach succeeds in that goal, at the cost of pretty minimal sugar/lies.

Thursday, January 2, 2014

Equalities in object-oriented programs

There's a problem in object-oriented programming: equality and polymorphism don't generally get along. Specifically, extra state introduced in subclasses breaks important properties of equality like transitivity. I have an idea for a solution in Effes which, though I'm sure isn't novel, I haven't seen in other languages. But first, I'd like to take a post to explore some aspects of the problem.

Consider two classes: Ball and WeightedBall extends Ball. A Ball has one piece of state, radius; two objects are equal if they're both balls with the same radius. A WeightedBall adds another piece of state, weight. How does this factor into equality?

One answer could be that WeightedBall checks to see if the other object is a WeightedBall or just a plain Ball. If it's a WeightedBall, look at both the weight and the radius; otherwise, just look at the radius. In pseudocode:

boolean isEqual(Object other):
    if other is a WeightedBall:
        return this.radius == other.radius
            && this.weight == other.weight
    else if other is a Ball:
        return this.radius == other.radius
    else:
        return false

This approach fails because it doesn't satisfy transitivity. Consider three balls and some equalities:

Given:
- ball5 = Ball(radius=5)
- lightBall5 = WeightedBall(radius=5, weight=10)
- heavyBall5 = WeightedBall(radius=5, weight=200)
Then:
- ball5 == lightBall5 (only check radius)
- ball5 == heavyBall5 (only check radius)
- lightBall5 != heavyBall5 (check radius and weight)

In a language like Java, there aren't any great solutions to this. Either you require that both objects have exactly the same class, so that a Ball is never equal to a WeightedBall; or you don't let subclasses add state to the equality check, so that lightBall5 and heavyBall5 are equal because they only check radius, as specified by the Ball.equals contract.

This problem occurs with ordering, too. Java handles this variant by letting you create explicit Comparator classes, so that a Comparator<? super Ball> only checks radius while a Comparator<? super WeightedBall> also checks weight. This also lets you define several orderings, so that you can sort balls by radius, weight, bounciness, or any other property.

The Comparator pattern is really the right one. It lets the programmer pick the right ordering for each task, and it makes reflexivity and transitivity easy. The problem is that it's verbose: every method that cares about T's ordering needs to take a Comparator<? super T>, and every class that defines an ordering needs to create a nested class, preferably as a singleton.

Java addressed this by creating a Comparable interface for providing a "natural" ordering; basically, a class can be its own comparator. That's helpful for the programmer who's writing a comparable class, but it inflates APIs that use ordering. Each one needs two variants: one that takes a T extends Comparable<? super T> and another that takes a T and a Comparator<? super T>. Here are two methods from java.util.Collections, for instance:

public static <T extends Comparable<? super T>>
       void sort(List<T> list)
public static <T>
       void sort(List<T> list, Comparator<? super T> c)

The Comparator pattern is a step in the right direction, but it does nothing for equality, nor for its close cousin, hashing. If you create a HashSet<WeightedBall>, it can only use the default definition of equals and hashCode that WeightedBall defines — and which must still be consistent with Ball's equals and hashCode. I suppose you could create a Hasher interface similar to Comparator, but nobody does — the JDK certainly doesn't. Your only option is to wrap each WeightedBall in a WeightedBallByWeight object that defines equality/hash.

This hybrid approach — Comparable and equals/hashCode on the object, Comparator as a separate object — also leads to inconsistencies between ordering and equality. The JavaDocs for TreeSet includes this bit:

Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface.

This means it's virtually impossible to use a Comparator correctly in some cases. If WeightedBall above went with the exact-class approach (so that it can check both radius and weight), you can't create a TreeSet<WeightedBall> with a custom comparator that only checks radius; if you did, you could have two balls that are not equal (same radius, different weights) but can't both be in the set, because the custom ordering says they are equal.

Similarly, if you decided to let Ball define the equality interface, so that WeightedBall doesn't check weight, then you can't create a TreeSet<WeightedBall> with a Comparator that also factors in weight. If you did, you could have two objects that are both in the set, and yet are equals to one another!

What we really need is a hierarchy of Comparator-like interfaces:

Equalitator<T> with boolean areEqual(T a0, T a1)
- Hasher<T> extends Equalitator<T> with int hashCode(T a0)
- Comparator<T> extends Equalitor<T> with int compare(T a0, T a1) and a restriction that areEqual(a, b) == (compare(a, b) == 0)

As I teased above, I have an idea for how to solve this in Effes. Stay tuned!

Friday, October 11, 2013

Of object composition and a maybe-dead cat

There's a problem with Effes that could prove deadly, at least to object composition: composed objects can conflict in ways that can't be caught at compile time, and there's no satisfying, simple way to resolve the conflicts. It's a problem I realized a few weeks ago while thinking about various use cases for composition.

To warm up, consider these types:

Animal
Dog is Animal
Happy
(Happy Dog) is Animal

Dog and Happy are simple types, and (Happy Dog) is their conjunction. Animal is an "interface-y" type with implementations provided by Dog and (Happy Dog). Let's say balto has a compile-time and runtime type of Happy Dog, and that Animal provides a method speak. Does balto speak invoke the implementation for Dog or Happy Dog?

That's not so tricky (I said it was a warmup!). Happy Dog is intuitively more specific than Dog, so we should invoke its speak. Let's try something just a bit trickier:

type Animal:
  speak -> String
data Dog is Animal:
  speak = "Woof!"
data Cat is Animal
  speak = "Meow!"

balto = Dog
felix = Cat
dogCat = balto @ felix
spoken = dogCat speak

This is trickier. The object dogCat is both a Dog and a Cat, both of which provide equally-specific overrides of Animal::speak. Which gets invoked? If your instinct is to just reject that at compile-time, consider this slightly more dynamic variant:

type Animal:
  speak -> String
data Talking, Dog, Cat -- no inherent relationships to Animal
type Talking Dog is Animal:
  speak = "Woof!"
type Talking Cat is Animal:
  speak = "Meow!"

balto = Talking Dog
felix : Cat = Talking Cat
dogCat = balto @ felix
spoken = dogCat speak

Here, felix has a run-time type of Cat but a compile-time type of Talking Cat, which is Animal. This means we can't detect a conflict at compile time, but dogCat still has two equally-specific implementations of speak at runtime.

We can come up with something clever based on compile-time contortions: take only the left object's implementation when they're composed, for instance, or take only the one whose compile-time type provides an implementation and fail to compile if they both provide an implementation. These would all work for the examples so far, though they're not very inspiring.

The problem is that Effes, like other functional languages, lets you do pattern matching — that is, runtime type checks. This is the killer:

type Animal:
  speak -> String
data Talking, Dog, Cat
type Talking Dog is Animal...
type Talking Cat is Animal...

balto : Dog = Talking Dog
felix : Cat = Talking Cat
dogCat = balto @ felix

spoken = case dogCat of
  Animal: dogCat speak
  else: "whatever"

balto and felix never have a compile-time type of Animal, and neither does dogCat (its compile-time type is just (Cat Dog). Yet both of them have a runtime type of Animal, and we intuit that dogCat should, too. The question, again, is what that speak method does.

There aren't many good, simple solutions to this that I can think of. There are bad, simple solutions; and there are interesting, complex ones. These are the ones I came up with:

A runtime failure at balto @ felix, since it would create a possibly ambiguous object.
A runtime failure when dogCat is matched against Animal and then invokes speak, since this is the aforementioned ambiguity.
Pick one of the implementations by some arbitrary rule, like the left-hand side of the conjunction.
Run both methods and return the conjunction of their results.

I really want to avoid runtime failures here: it's too easy to stumble into this case. The last option is interesting in a computer sciencey way, but it creates a possibly complex world. Let's call this the Shroedinger approach, since it creates an object that's like a superpositioning of barks and meows. What if we then run that object through a pattern match?

who = case spoken of
  "Woof!": "It was a dog"
  "Meow!": "It was a cat"
  else: "What was it?"

If we took the Shroedinger approach for speak, then spoken is ("Woof!" @ "Meow!"); it can match against either of the first two cases. Pattern matching traditionally executes just the first branch that matches, but this approach is inconsistent with the Shroedinger approach that got us this spoken in the first place. So, let's stick with the Shroedinger philosophy of taking all of the cases and conjoining them: the resulting object is a conjunction ("It was a dog" @ "It was a cat"). Of course, the expression "It was a dog" could instead be some method, one which itself might return a "superpositioned" object; and the same could be true of "It was a cat". By the time we finish the case statement, who might be pretty darn complex!

In short, we have three options for handling ambiguity: runtime failure, subtle behavior that depends on composition being non-commutative, or subtle behavior in which each ambiguous decision split the program into a more ambiguous runtime state. The last of those options is the most novel and interesting, but eventually we have to collapse the wave-form: most of the time, we need to end up printing either "the dog says woof" or "the cat says meow."

One option is to provide both philosophies: two parallel case statements, one of which executes all branches and the other of which executes the first it finds. A similar option is to provide a sort of decomposition; given an object with a compile-time type Foo, return an object which is only a Foo, with all other composed objects removed.

Even if we take this approach, all this superpositioning makes the code a lot less strictly typed. The type system provides a very low bound on each object's guarantees! Traditional type systems generally make guarantees like, "it'll be a list, but we don't know what kind of list." Shroedinger-Effes will be able to say "it's an empty list, but we don't know if it's also a non-empty list."

It also raises the question of whether methods are overridden, or if they're always superimposed. Remember our first Balto, the Happy Dog? We intuited that its speak should override the "less specific" implementation provided by Dog. Does that still happen, or do we Shroedinger it up and invoke both?

There is one simpler option. I wrote above that because balto and felix each have a runtime type (but not compile-time type) of Animal we'd want their composed object to also have a runtime type of Animal. What if this weren't the case? That is, that runtime behavior could only ever override types and methods that are known at compile time, but not to perform arbitrary runtime checks such as whether dogCat is an Animal? That may be the way to go; I'll think on it for a bit.

Wednesday, August 7, 2013

Conjunction disjunction, what's your function?

It's time to turn my attention to functions. Like virtually any language, Effes has functions; like any language written in the last few years, those functions are data. (At what point does the term "functional programming language" start to lose its meaning, given that every language these days incorporates FP principles?)

Every function in Effes has a type, which is just its signature. For instance, doubler(i : Int) = i * 2.0 would have a type of Int -> Double. Subtyping basically follows the Liskov substitution principle: f : Vehicle -> Door is a subtype of g : Car -> Part because you can use f anywhere you'd be able to use g.

Like data types, function types can be composed into conjunctive and disjunctive types. Take the following:

data Cat, Person
countLegs : (Cat -> Int) = c -> 4
sayHi : (Person -> String) = p -> "Hi, I'm #{p.name}"

countHi = countLegs sayHi

countHi is a conjunction of countLegs and sayHi. Invoking this function doesn't chain or compose them; only one will get invoked. Think of it as a box that contains both functions: invoking this box means opening it, taking out one of the functions, and using it.

This follows the data trait analogy: a Cat Person type refers to an object that is both a feline and a human; it also can be thought of as two separate objects stuck together. When you have a Cat Person and ask for its fur type, you're looking at its cat-like nature; when you ask for its social security number, you're looking at its person-like nature. Similarly, countHi has both a countLegs-like nature and a sayHi-like nature.

(There's a slight difference here in that a conjunctive data type can be used in functions that look at both of its aspects; this doesn't apply to functions.)

This means that the input to countHi can either be a Cat (in which case countLegs is invoked) or a Person (in which case sayHi is invoked). In the first case, the result is an Int, and in the second case it's String. Putting that together, we get a type of (Cat | Person) -> (Int | String). More generally:

(Ia -> Ra) (Ib -> Rb) : (Ia | Ib) -> (Ra | Rb)

Disjunctions follow the same analogy; a Cat | Person type refers to a single object which is a feline or person, so functional disjunctions work the same way. Imagine a type that said "I'm either countLegs or sayHi". To pass an argument into it, you'd have to guarantee that the argument could be passed to either function; it would have to be both a Cat and a Person. Depending on which function got invoked, the result would be either an Int or a String.

(Cat -> Int) | (Person -> String)
 : (Cat Person) -> (Int | String)

(I0 -> R0) | (I1 -> R1) : (I0 I1) -> (R0 | R1)

Notice the symmetry here: The input to a conjunction of functions is the disjunction of their inputs, while the input to a disjunction of functions is the conjunction of their inputs. In both cases, the result is a disjunction of their results.

One edge case is when the input to a conjunction is composed of both input types. For instance, what happens if we break out the beakers and engineer a Cat Person? If we pass it to a conjunction countLegs sayHi, both functions in that composed function can process it. Which gets to? (This isn't a problem for a disjunctive type, since its object is just a single function.)

I'm not positive how to handle this case. One solution is to have both functions process it, and then compose the result. So, passing Cat Person to countHi = countLegs sayHi would result in a (4 "Hi, I'm Yuval") : (Int String).

Thursday, July 25, 2013

Polymorphic upcasting?

In my last post, I summarized Effes' polymorphism by observing that you can't know exactly how an object will behave unless you create it. That's actually not so weird — it's how all OO code works. I know exactly how an ArrayList works in Java; I have pretty much no clue how a List I get from a function will work (unless I look at that function's code, of course). The twist that Effes brings is that the binding is done based off of an object's specific composition, not based off its class. It's much more fluid.

That last post had various examples, but here's an interesting one: you can actually upcast an object's function resolution! Here's how:

data Loud; Animal; Cat
Cat is Animal

speak(c:Loud Animal) : String = "Stomp!"
speak(c:Loud Cat) : String = "Meow!"

getAnimal() : Animal = Loud Cat
animal = getAnimal() // 1
loudAnimal = Loud animal // 2

speak(animal)
speak(loudAnimal)

At line // 1, we get some sort of Animal. We don't know what kind, but we don't need to because of polymorphism. As it happens, it's a Loud Cat, which means it'll meow at us when we ask it to speak.

At line // 2, we take our animal and make it loud. The interesting thing to note here is that the animal already was loud, though the compiler doesn't know it: specifically, it was a Loud Cat. But, since the compiler doesn't know that, it creates a new object, with new bindings. At that point, the only thing the compiler knows is that it's a Loud Animal, so it binds it to the stomping version of speak.

This behavior is a bit counterintuitive at first. By doing what most systems would call a no-op (I even said it would be, last month!), an object's behavior changes. Specifically, adding the Loud trait to an object that already had it ended up changing its function binding.

This may seem weird, but I'd like to give it a try. My ~~hypothesis~~ hope is that the tradeoff of more deterministic object creation is worth it.

Incidentally, this rebinding would not happen if the operation was known to be a no-op at compile time:

getLoudAnimal() : Loud Animal = Loud Cat
animal = getLoudAnimal()
loudAnimal = Loud animal

Here, animal is already Loud, so adding Loud to it becomes a compile-time no-op, with no rebinding. While this particular example is silly, this use case is actually useful if the redundant trait has parameters that you'd like to change:

data Microchipped(id : Int)
data Cat

getAnimal() : Microchipped Animal =
  Microchipped(123) Cat

animal1 = getAnimal()
animal2 = Microchipped(456) animal1

Here, we're changing the animal's microchip id without having to worry about upcasting its function resolution.

Wednesday, July 24, 2013

Subtyping and function resolution (at last!)

I've been putting off for too long my ideas for subtyping and function resolution, because I've wanted to get things juuust right. Instead, I'm going to just outline my ideas and move on, because I want to start coming up with a grammar and writing some actual Effes. I've been dragging my feet for too long!

So, subtypes first, then function resolution.

Type hierarchy in Effes

A is a subtype of B if there is a "class-based" implementation that says B is A. For instance, if we declare that Stack[A] is Sizeable where..., then Stack[A] is a subtype of Sizeable
A is a subtype of B if A is a conjunction type whose parts include B. For instance, FourWheeled Vehicle is a subtype of FourWheeled and Vehicle.
Union types act as a single unit and don't establish any subtype relationships with their components. So:
- Cat | Mouse isn't a subtype of Cat or Mouse.
- Neither Cat nor Mouse are a subtype of Cat | Mouse.
- However, since Cat | Mouse is a single unit, Cute ( Cat | Mouse) is one of its subtypes.
Any type is of course a subtype of itself.
Without getting into too much details, subtypes work with the components of conjunction types, not just the full conjunction. For instance, if Box is a subtype of Container, then Big Box is a subtype of Big Container.

I think for generic types, I'll say that Box[Gum] is a subtype of Container[Candy] if Box is a subtype of Container and Gum is a subtype of Candy. This is extremely up for grabs; it can wreak havoc on collections, but that's because they're mutable. Effes has a much stricter set of restrictions on mutability, which I think eliminate the problem.

Function resolution

In a nutshell: functions are bound to an object according to the most specific type information known about it at compile time. (That last bit is important!) For each function f, find the most specific subtype that defines f, and bind that version of f to that object.

-- Define some types
data Animal; Ascii; Cat; Dog
Dog is Animal
Cat is Animal

-- Define some methods
speak(a:Animal) : String = "Zzzz" -- all animals sleep
speak(c:Cat) : String = "Purr"
speak(c:Ascii Cat) : String = ">^-^<"

-- Invoke them
speak(Cat) -- Purr
speak(Dog) -- Zzzz
speak(Ascii Cat) -- >^-^< 
peak(Dog Cat) -- compile-time error
speak(Ascii Animal Cat) -- >^-^<

The first three of the function invocations should be pretty straightforward. The fourth case is a compile-time error because Dog Cat is "equally subtype-y" to both Dog and Cat, so the compiler doesn't know which method to use. (Maybe I'll provide some overriding behavior in the future, but for now I'm just going to disallow it altogether.) The last one is an edge case — hey, every language has to have 'em! The reason is that type conjunctions are commutative and associative: (A B) C is the same as B (C A). So, even though Ascii Animal Cat doesn't look like anything meaningful, it can also be expressed as (Ascii Cat) Animal, which is a subtype of both Ascii Cat and Animal (as well as of Ascii and Cat, but that's not important for this example). Since Ascii Cat is itself a subtype of Animal, it is the most specific subtype and thus supplies the function. One last thing, which is the bit about binding at compile time. Let's tweak the menagerie a bit:

data Ascii; Hungry; Cat

speak(c:Ascii Cat) : String = ">^-^<"
speak(c:Hungry Cat) : String = "Meow!!" -- at 5am...
speak(c:Hungry Ascii Cat) : "4d 65 6f 77 21 2`"

getKitty() : Cat = Ascii Cat
c1 = getKitty() c2 = Hungry c1

In this example, c1 has a compile-time type of Cat and a runtime type of Ascii Cat. However, where the object was created (namely, the only expression in the body of getKitty), it had a compile-time type of Ascii Cat, so speak(c1) is bound to speak(c:Ascii Cat) and would result in ">^-^<". The last line is where things get potentially unfamiliar. Even though c1 has a runtime type of Ascii Cat, it has a compile-time type of just Cat, so Hungry c1 has a compile-time type of Hungry Cat, not Hungry Ascii Cat. This means speak(c2) would result in "Meow!!".

This feels a bit weird at first blush: it's a blend of polymorphism and... something that feels un-polymorphic. I may change my mind, but the reason I like it is that you can know, easily and deterministically, how functions will be bound just by looking at where an object is created. That is, you don't have to reason about where the object came from and what its runtime type might be; you just look at what it is, and there you go.

The basic rule is that you can't know exactly how an object will behave unless you create it.

Friday, June 28, 2013

When is a new object really a new object?

Since I just brought up object construction, it's worth discussing when a new object is really created, as opposed to when a previous object can be reused. This will also anticipate some of the function resolution requirements, which I'll introduce in a few posts. As usual, I'll lead by example.

First, some data types to work with:

data Red; Green; Blue nicknamed Color
data Colorful(color : Color)
data Box(size : Int)
data Fragile

Hopefully this should be getting familiar by now. We start with three stateless traits, the union of which is nicknamed Color. We then introduce two stateful traits, followed by a third trait to mark fragility. Now let's create some objects:

a = Colorful Box where
    color = Red
    size = 27
b = Fragile b
b2 : Box = b
c = Fragile b
c2 = Fragile b2
d = Colorful c where
    color = Blue

The first assignment just creates a composite object, as I described in the last post. Easy.

The second assignment (the one to b) creates a new object. It has to, because (as I'll elaborate in the upcoming function resolution post) the object's runtime type is maintained even if the object is "upcasted" later, as happens with b2. We need some way of knowing that b is Fragile while a is not, and the easiest (onlyest?) way to do that is by creating a new object.

But Fragile has no state, which means all Fragile objects are identical. In fact, internally, Fragile could be just a metadata marker on the object. That means that when we re-add the Fragile trait in c = Fragile b, we can actually reuse the same b object. This optimization can be done at compile time.

The c2 assignment is also a no-op, but only at runtime. Even though the compile-time type of b2 doesn't include Fragile, at runtime we can see that the b2 object is already Fragile, and so we simply return it.

Contrast that to the d assignment, which also re-sets the state of one of the component traits. In this case, we do need to create a new object in the general case: objects are immutable, so we can't change the state of c, but we definitely need to store the fact that we now have a blue box where we used to have a red one. (If the new color happens to be the same as the old one, we reuse the object in principle; I'm not sure if this check would be cheaper than unconditionally creating a new object.)

Implicit in all of the above is that there is not a way to check for referential equality — that is, that the programmer won't ever care (or know) if the runtime reuses an object. I think this is a good idea even without these optimizations, so I'm going to throw that into the Effes "spec." Truth be told, I've been assuming it all along.

Incidentally, since the compile-time type of c is Fragile Colorful Box, we could have just written d = Colorful c (without the where clause). This would say that we want to change none of the fields, in which case the whole thing is a no-op at compile-time (as c was). If Colorful had more than one field (maybe it has an alpha : Float), we could have used this syntax to change only one of the fields. This would obviously still create a new object.

On the other hand, if we'd written d2 = Colorful c2, then we'd have to provide the field values. c2 has a compile-time type of Fragile Box, and the compiler would require that if we add color to this box, we need to specify which color. The fact that c2 already has a color at runtime is irrelevant; the compiler will require that the program specific values for all Color fields (in this case, just the one), and the runtime will create a new object that overwrites these fields.

Monday, June 17, 2013

Stateful assumptions

The last major component of Effes' data types is the concept of stateful assumptions. As the last few posts have discussed, assumptions can be used to declare what Java would call an interface — a set of functions. The Stack[A] assumption I've been using as an example was satisfied ("implemented") by a nickname LinkedList[A], which stood in for a union type of (Nothing | Node[A]). Let's add into the mix another assumption, which says that an item has a size.

The Sizeable assumption just declares that something has a size. We can easily declare that being a Stack[A] implies being Sizeable:

Sizeable -> size :: Int

Stack[A] -> Sizeable st
    size = case head of
        Nothing -> 0
        _ -> 1 + (size pop)

This works, but it's O(N). Let's say we want to make this O(1); in this case, we'll need to maintain the size as a piece of state. Here's how we can do it:

QuickSizeable Stack[A] -> Sizeable st
    s :: Int initially (size it)
    pop -> s' = s - 1
    push _ -> s' = s + 1
    size = s

This defines a new assumption, QuickSizeable, and says that being a QuickSizeable Stack[A] implies being Sizeable (which the Stack[A] already was) with a given implementation. Note that QuickSizable Stack[A] is a composite type. Until now, we've seen composite union types; this is our first composite conjunctive type. In this example, implementation says that:

When you initially declare a value as being QuickSizeable Stack[A], QuickSizeable will initialize an s :: Int with the size of the Stack[A], however it determines it. At worst, it will be the generic, O(N) implementation.
When you pop from the Stack[A], the result's s' is one less than the original s. Remember that the apostrophe means "the un-apostrophe'd name within the context of the result."
When you push to the Stack[A], the result's s' is one more than the original.
The QuickSizeable Stack[A]'s size method simply returns s

The field s :: Int doesn't belong to either QuickSizeable or Stack[A]; it belongs to the composite type QuickSizeable Stack[A]. This is significant: composite types are "real" types that can define new state or new methods. (We've seen this before, when the LinkedList[A] nickname for the union type Nothing | Node[A] defined implementations of Stack[A] methods.)

This all makes sense, but there's a function resolution problem. Consider the following:

a :: Stack[Foo] = createSomeStack()
b :: QuickSizeable Stack[Foo] = a
c :: Stack[Foo] = b
a.size
b.size
c.size

What happens in each of the three size invocations? a.size is easy: the only method available is the O(N) version that any Stack[A] gets by the first bit of code, above. b.size is also easy: use the O(1) version defined by QuickSizeable Stack[A]. For the third version, we'd like to use the O(1) version — it's more efficient and intuitively feels like polymorphic overriding in OO languages — but we need a rule to allow this. The function resolution rules proposed in the last post are ambiguous.

The solution I have in mind, though I haven't formalized it yet, is that if a type A implies that all As are also B, then A's methods override B's. In this case, QuickSizeable Stack[A] -> Sizeable means that all QuickSizeable Stack[A]s are also Sizeable, so the implementation of size in QuickSizeable Stack[A] overrides the implementation that "belongs" to Stack[A], which we defined above.

What if we now define another sizer:

StupidSizer Stack[A] -> Sizeable st
    size = 1234

What happens with a type StupidSizer QuickSizeble Stack[A]? Does the size method act like a StupidSizer Stack[A] or a QuickSizeable Stack[A]? The call is ambiguous, and is thus disallowed. The type itself is allowed, and you can invoke any unambiguous methods on it; but attempting to invoke ambiguous methods, such as size in this case, will result in a compiler error. To fix it, you must first cast the object to a type with an unambiguous size.

Hm, I think I can unify some of the concepts in the last few posts. Wait for it...