Yuval Shavit: edge cases

Showing posts with label edge cases. Show all posts

Thursday, February 20, 2014

Method invocation syntax

I’ve been giving some thought to method invocation lately, trying to come up with something that’s fluent in simple cases, and familiar (to programmers) for the more complex cases. After a bit of playing around, I think I have a system I like.

Consider Java’s BigDecimal, and specifically its divide method. It feels very programmer-y:

aNum = myNum.divide(someOtherNum)

Wouldn’t it be nice if we could make this feel more natural?

aNum = myNum dividedBy someOtherNum

That suggests a grammar of object methodName arg0[, arg1, arg2...]. But if you have more than a couple args, this gets a bit muddy; the words all clump together in my brain, and it’s not entirely clear what’s what anymore: foo doBar baz, apple, banana, coconut. If anything, it looks like the logical grouping is (foo doBar baz) (apple banana coconut). Of course, it isn’t, and my brain knows that… but it’s not intuitive to my eye.

As I was looking around at various methods, I noticed another interesting thing: very often for multi-arg methods, there’s one “main” argument that’s followed by “secondary” arguments. In human-language grammar terms, there’s a single direct object, and then some adjectives and adverbs.

BigDecimal.divide(BigDecimal, RoundingMode) is a good example: the first argument is what you’re dividing by, and the second is just some almost-parenthetical info on how to do the division. It feels like this:

aNum = myNum dividedBy someOtherNum: HalfUp

This suggests a grammar of object methodName arg0 [: arg1, arg2, arg3...]. And that’s in fact what I think I’m going to go with (with a slight tweak that I’ll get to in a bit).

There’s an obvious problem, which is that not all methods follow that semantic pattern. For instance, List.sublist takes two arguments, fromIndex and toIndex. Neither modifies the other; they’re both “primary” args. (This may have been different if the arguments were fromIndex and length, but they’re not). You really do want to invoke this using the parentheses we all know and love:

aList = myList sublist (3, 7)

Yikes — does that mean I need two ways to invoke methods? Worse yet, do I let the call sites determine which to use, so that sometimes I’ll see myList sublist 3: 7 and sometimes I’ll see myNum dividedBy (someOtherNum, HALF_UP)? The latter isn’t bad, but I don’t want my language to encourage inconsistent style on things like this. So maybe I want to let the method declaration define which syntax to use… but how?

The solution is actually pretty simple: methods like sublist take only one arg, but it’s a tuple! That’s not enforced by the language, of course, but the syntax for declaring methods should mirror the syntax for calling them, so that things will naturally work out.

The one big issue with that grammar is that the : char is already used in lots of places, and in particular as a way of declaring a variable’s type (including to upcast it). For instance, myNum divided by someOtherNum : SomeType is ambiguous; does it take one arg, someOtherNum : SomeType, or does it take two args, someOtherNum and SomeType?

To solve this, I’m going to make a slight aesthetic concession and replace the : with {...} in method invocation.

aNum = myNum dividedBy someOtherNum { HalfUp } -- two args, num and mode
aList = myList sublist (3, 7)        -- one arg, a tuple of (start, end)

As I mentioned above, the method declaration should mirror invocation. Something like:

dividedBy divisor:BigDecimal { mode: RoundingMode } -> BigDecimal: ...
aList (start: Int, end: Int) -> List[E]: ...

I like this approach a lot, except for the curly braces. Ideally I’d use a colon, or even a pipe, but all of the single-char approaches I could think of would either cause ambiguity or be ugly. For instance, a pipe would be fine at the call site, but create visual ambiguities at declaration:

dividedBy divisor: BigDecimal | mode: RoundingMode -> BigDecimal: ...

That pipe looks like a disjunctive type at a glance. This isn’t an ambiguity from the grammar’s perspective, since mode is clearly a variable name and not a type (Effes enforces the capitalization scheme that Java suggests), but it’s not nice on the eyes. Some optional parentheses would help, but it’s hard to get excited about that. So for now, curly braces are it.

The thing I like about this syntax is that with one rule, I get everything I want. Simple methods look fluent; methods with adverbs look good (if a tad clunky with the braces); and in the worst case, I get something that’s no worse than what most of the popular languages out there require or recommend.

Thursday, January 2, 2014

Equalities in object-oriented programs

There's a problem in object-oriented programming: equality and polymorphism don't generally get along. Specifically, extra state introduced in subclasses breaks important properties of equality like transitivity. I have an idea for a solution in Effes which, though I'm sure isn't novel, I haven't seen in other languages. But first, I'd like to take a post to explore some aspects of the problem.

Consider two classes: Ball and WeightedBall extends Ball. A Ball has one piece of state, radius; two objects are equal if they're both balls with the same radius. A WeightedBall adds another piece of state, weight. How does this factor into equality?

One answer could be that WeightedBall checks to see if the other object is a WeightedBall or just a plain Ball. If it's a WeightedBall, look at both the weight and the radius; otherwise, just look at the radius. In pseudocode:

boolean isEqual(Object other):
    if other is a WeightedBall:
        return this.radius == other.radius
            && this.weight == other.weight
    else if other is a Ball:
        return this.radius == other.radius
    else:
        return false

This approach fails because it doesn't satisfy transitivity. Consider three balls and some equalities:

Given:
- ball5 = Ball(radius=5)
- lightBall5 = WeightedBall(radius=5, weight=10)
- heavyBall5 = WeightedBall(radius=5, weight=200)
Then:
- ball5 == lightBall5 (only check radius)
- ball5 == heavyBall5 (only check radius)
- lightBall5 != heavyBall5 (check radius and weight)

In a language like Java, there aren't any great solutions to this. Either you require that both objects have exactly the same class, so that a Ball is never equal to a WeightedBall; or you don't let subclasses add state to the equality check, so that lightBall5 and heavyBall5 are equal because they only check radius, as specified by the Ball.equals contract.

This problem occurs with ordering, too. Java handles this variant by letting you create explicit Comparator classes, so that a Comparator<? super Ball> only checks radius while a Comparator<? super WeightedBall> also checks weight. This also lets you define several orderings, so that you can sort balls by radius, weight, bounciness, or any other property.

The Comparator pattern is really the right one. It lets the programmer pick the right ordering for each task, and it makes reflexivity and transitivity easy. The problem is that it's verbose: every method that cares about T's ordering needs to take a Comparator<? super T>, and every class that defines an ordering needs to create a nested class, preferably as a singleton.

Java addressed this by creating a Comparable interface for providing a "natural" ordering; basically, a class can be its own comparator. That's helpful for the programmer who's writing a comparable class, but it inflates APIs that use ordering. Each one needs two variants: one that takes a T extends Comparable<? super T> and another that takes a T and a Comparator<? super T>. Here are two methods from java.util.Collections, for instance:

public static <T extends Comparable<? super T>>
       void sort(List<T> list)
public static <T>
       void sort(List<T> list, Comparator<? super T> c)

The Comparator pattern is a step in the right direction, but it does nothing for equality, nor for its close cousin, hashing. If you create a HashSet<WeightedBall>, it can only use the default definition of equals and hashCode that WeightedBall defines — and which must still be consistent with Ball's equals and hashCode. I suppose you could create a Hasher interface similar to Comparator, but nobody does — the JDK certainly doesn't. Your only option is to wrap each WeightedBall in a WeightedBallByWeight object that defines equality/hash.

This hybrid approach — Comparable and equals/hashCode on the object, Comparator as a separate object — also leads to inconsistencies between ordering and equality. The JavaDocs for TreeSet includes this bit:

Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface.

This means it's virtually impossible to use a Comparator correctly in some cases. If WeightedBall above went with the exact-class approach (so that it can check both radius and weight), you can't create a TreeSet<WeightedBall> with a custom comparator that only checks radius; if you did, you could have two balls that are not equal (same radius, different weights) but can't both be in the set, because the custom ordering says they are equal.

Similarly, if you decided to let Ball define the equality interface, so that WeightedBall doesn't check weight, then you can't create a TreeSet<WeightedBall> with a Comparator that also factors in weight. If you did, you could have two objects that are both in the set, and yet are equals to one another!

What we really need is a hierarchy of Comparator-like interfaces:

Equalitator<T> with boolean areEqual(T a0, T a1)
- Hasher<T> extends Equalitator<T> with int hashCode(T a0)
- Comparator<T> extends Equalitor<T> with int compare(T a0, T a1) and a restriction that areEqual(a, b) == (compare(a, b) == 0)

As I teased above, I have an idea for how to solve this in Effes. Stay tuned!

Tuesday, November 19, 2013

I'm going to have to maybe-kill the cat

In my last post, I discussed problems with the runtime binding of composed objects: if object a has a method foo, and object b also has a method foo, then how does the composed object (a <?> b) behave? In this post, I'd like to explore a separate but related pair of problems: what's the type of a <?> b, both at run-time and compile-time?

(Before you get too far into this post, a disclaimer: this post describes a method of composition and then explains why it doesn't work. If that strikes you as meandering and pointless — if you prefer to read about ideas that might work, rather than those that definitely won't — then you might want to skip this post.)

As before, let me set this up with an example. As an added bonus, you'll get to see the latest revision of Effes' syntax, which is pretty close to complete (now I really mean it!). Since I haven't laid out how composition works, I'll use <?> as a placeholder for "some variant of composition." With that said:

type List[A] = data Node(head:A, tail:List[A]) | Nothing
  put e:A -> List[A] = TODO
  get -> (Maybe[A], List[A]) = TODO

type Box[A] = Maybe[A]:
  put e:A -> Box[A] = TODO
  get -> Maybe[A] = TODO

myList : List[Int] = TODO
myBox : Box[Int] = TODO
myContainer = myList <?> myBox

Here we have two simple types: one representing a linked list, and one representing a box that can contain zero or one items. Their details aren't important. We also have three references: one of type List[A], one of Box[A], and one representing the composition of the first two.

As I discussed in the last post, we have some options with a call like myContainer.put 1. We can take the Schrodinger approach, in which we invoke put on both components of myContainer and then compose those two results; or we can pick one as the "winning" binding and only invoke it.

Let's say we pick the single-bound, "winner" approach; that's the option I was leaning towards in the last post. To make it concrete, let's say that the left-hand side always wins out. That seems reasonable, but what about this situation:

myContainer2 = myContainer.put 123
myVal : Maybe[Int]
myVal = case myContainer2 of
  Box[Int] -> myContainer.get
  _ -> Nothing

In this snippet, we check to see if myContainer is a Box[Int]. It is, so the first pattern will match. This casts myContainer2 to Box[Int] and, within that new context, invokes get. Note that get here has to be bound to the Box version; the List[A] version has a different return type (it returns a pair containing the list's possible head and its tail).

The problem is that myContainer.put was bound to the List[A] method, meaning that the Box component of myContainer never had 123 inserted into it: the value of myVal is Nothing, not 123! This is fully consistent, but it's confusing and violates the principle of least surprise. There has to be a better way!

One possibility is to limit composition such that myContainer2 does not have a runtime type of Box[Int]: the second (catch-all) pattern matches, and the value of myVal is Nothing.

Of course, we always want the runtime type to be a subtype of the compile-time type, so this new approach means that the composed object's compile-time type can't include Box[Int]. We can't generalize that to all compositions, or else the whole idea of composition falls apart and becomes shorthand for the not-very-useful function a <?> b = a.

We can come up with a more precise limit on composition, one that doesn't just throw away the RHS altogether. One approach to compose objects in multiple phases. Here's one such algorithm, in rough terms:

Decompose the RHS object to its constituent objects.
Fold each one of those into the LHS, one at a time. As you fold each object, check to see whether it has any methods that conflict with methods on the composed object; if so, ignore that object (don't fold it in).
Check whether the resulting object has any abstract methods (that weren't given a concrete definition in the composed object). If so, remove the objects that introduced those methods.

These three phases happen separately for both the compile-time type and the runtime type. Here's an example, starting with some types and some objects:

type List[A] =
    ... (put/get, as above)
    size -> Int
type Box[A] = ... (put/get, as above)

type Container:
  isEmpty -> Bool -- abstract method

type Container Box[A]:
  @override isEmpty -> Bool = TODO

intsList : List[Int] = list(1, 2, 3)
intsBox : Box[Int] = box(4)

intsContainer1 = intsBox <?> Container

The List[A] and Box[A] types are as before, except that List[A] now also has a size method.

Let's look at that last line. The LHS is Box[Int] with methods put and get, while the RHS is Container with just one method, isEmpty. There's no overlap in methods, so the composition is straightforward and results in Box[Int] <?> Container for both the compile-time and runtime type.

Okay, so that case works fine. But what about this?

intsContainer2 : Container = intsContainer1
composed2 = intsContainer2 <?> intsList
composedSize = composed2.size

At compile time, that last line looks like Container <?> List[Int], which has no conflicts and thus doesn't remove any types. But the Container component's isEmpty method isn't implemented, so it's removed: the resulting type is List[Int]. We call its size method, which is a pretty reasonable operation to call on a list.

At runtime, the composition looks like (Box[Int] <?> Container) <?> List[Int], and there is a conflict: List[Int]'s methods collide with Box[Int], and we therefore don't fold List[Int] into the composition. That means that the resulting type is just Box[Int] <?> Container — the inverse of the compile-time type! And in particular, there's no size method on that object. Boom.

I've tried a few variants on this sort of decompose-and-recompose theme: decomposing both sides, different handling of abstract methods, etc. Inevitably, the mismatch between runtime and compile-time types always blows up in my face. I don't think there's a way around it.

Unfortunately, I think this leads me to the conclusion that if I'm going to do composition, I have to do composition all the way: Schrodinger-style. This means I'll need to figure out various issues about collapsing the wave form: how does pattern-matching work, and what happens when an object is composed with an object of the same type? For instance, how does "foo" <?> "bar" work?

Friday, October 11, 2013

Of object composition and a maybe-dead cat

There's a problem with Effes that could prove deadly, at least to object composition: composed objects can conflict in ways that can't be caught at compile time, and there's no satisfying, simple way to resolve the conflicts. It's a problem I realized a few weeks ago while thinking about various use cases for composition.

To warm up, consider these types:

Animal
Dog is Animal
Happy
(Happy Dog) is Animal

Dog and Happy are simple types, and (Happy Dog) is their conjunction. Animal is an "interface-y" type with implementations provided by Dog and (Happy Dog). Let's say balto has a compile-time and runtime type of Happy Dog, and that Animal provides a method speak. Does balto speak invoke the implementation for Dog or Happy Dog?

That's not so tricky (I said it was a warmup!). Happy Dog is intuitively more specific than Dog, so we should invoke its speak. Let's try something just a bit trickier:

type Animal:
  speak -> String
data Dog is Animal:
  speak = "Woof!"
data Cat is Animal
  speak = "Meow!"

balto = Dog
felix = Cat
dogCat = balto @ felix
spoken = dogCat speak

This is trickier. The object dogCat is both a Dog and a Cat, both of which provide equally-specific overrides of Animal::speak. Which gets invoked? If your instinct is to just reject that at compile-time, consider this slightly more dynamic variant:

type Animal:
  speak -> String
data Talking, Dog, Cat -- no inherent relationships to Animal
type Talking Dog is Animal:
  speak = "Woof!"
type Talking Cat is Animal:
  speak = "Meow!"

balto = Talking Dog
felix : Cat = Talking Cat
dogCat = balto @ felix
spoken = dogCat speak

Here, felix has a run-time type of Cat but a compile-time type of Talking Cat, which is Animal. This means we can't detect a conflict at compile time, but dogCat still has two equally-specific implementations of speak at runtime.

We can come up with something clever based on compile-time contortions: take only the left object's implementation when they're composed, for instance, or take only the one whose compile-time type provides an implementation and fail to compile if they both provide an implementation. These would all work for the examples so far, though they're not very inspiring.

The problem is that Effes, like other functional languages, lets you do pattern matching — that is, runtime type checks. This is the killer:

type Animal:
  speak -> String
data Talking, Dog, Cat
type Talking Dog is Animal...
type Talking Cat is Animal...

balto : Dog = Talking Dog
felix : Cat = Talking Cat
dogCat = balto @ felix

spoken = case dogCat of
  Animal: dogCat speak
  else: "whatever"

balto and felix never have a compile-time type of Animal, and neither does dogCat (its compile-time type is just (Cat Dog). Yet both of them have a runtime type of Animal, and we intuit that dogCat should, too. The question, again, is what that speak method does.

There aren't many good, simple solutions to this that I can think of. There are bad, simple solutions; and there are interesting, complex ones. These are the ones I came up with:

A runtime failure at balto @ felix, since it would create a possibly ambiguous object.
A runtime failure when dogCat is matched against Animal and then invokes speak, since this is the aforementioned ambiguity.
Pick one of the implementations by some arbitrary rule, like the left-hand side of the conjunction.
Run both methods and return the conjunction of their results.

I really want to avoid runtime failures here: it's too easy to stumble into this case. The last option is interesting in a computer sciencey way, but it creates a possibly complex world. Let's call this the Shroedinger approach, since it creates an object that's like a superpositioning of barks and meows. What if we then run that object through a pattern match?

who = case spoken of
  "Woof!": "It was a dog"
  "Meow!": "It was a cat"
  else: "What was it?"

If we took the Shroedinger approach for speak, then spoken is ("Woof!" @ "Meow!"); it can match against either of the first two cases. Pattern matching traditionally executes just the first branch that matches, but this approach is inconsistent with the Shroedinger approach that got us this spoken in the first place. So, let's stick with the Shroedinger philosophy of taking all of the cases and conjoining them: the resulting object is a conjunction ("It was a dog" @ "It was a cat"). Of course, the expression "It was a dog" could instead be some method, one which itself might return a "superpositioned" object; and the same could be true of "It was a cat". By the time we finish the case statement, who might be pretty darn complex!

In short, we have three options for handling ambiguity: runtime failure, subtle behavior that depends on composition being non-commutative, or subtle behavior in which each ambiguous decision split the program into a more ambiguous runtime state. The last of those options is the most novel and interesting, but eventually we have to collapse the wave-form: most of the time, we need to end up printing either "the dog says woof" or "the cat says meow."

One option is to provide both philosophies: two parallel case statements, one of which executes all branches and the other of which executes the first it finds. A similar option is to provide a sort of decomposition; given an object with a compile-time type Foo, return an object which is only a Foo, with all other composed objects removed.

Even if we take this approach, all this superpositioning makes the code a lot less strictly typed. The type system provides a very low bound on each object's guarantees! Traditional type systems generally make guarantees like, "it'll be a list, but we don't know what kind of list." Shroedinger-Effes will be able to say "it's an empty list, but we don't know if it's also a non-empty list."

It also raises the question of whether methods are overridden, or if they're always superimposed. Remember our first Balto, the Happy Dog? We intuited that its speak should override the "less specific" implementation provided by Dog. Does that still happen, or do we Shroedinger it up and invoke both?

There is one simpler option. I wrote above that because balto and felix each have a runtime type (but not compile-time type) of Animal we'd want their composed object to also have a runtime type of Animal. What if this weren't the case? That is, that runtime behavior could only ever override types and methods that are known at compile time, but not to perform arbitrary runtime checks such as whether dogCat is an Animal? That may be the way to go; I'll think on it for a bit.

Thursday, July 25, 2013

Polymorphic upcasting?

In my last post, I summarized Effes' polymorphism by observing that you can't know exactly how an object will behave unless you create it. That's actually not so weird — it's how all OO code works. I know exactly how an ArrayList works in Java; I have pretty much no clue how a List I get from a function will work (unless I look at that function's code, of course). The twist that Effes brings is that the binding is done based off of an object's specific composition, not based off its class. It's much more fluid.

That last post had various examples, but here's an interesting one: you can actually upcast an object's function resolution! Here's how:

data Loud; Animal; Cat
Cat is Animal

speak(c:Loud Animal) : String = "Stomp!"
speak(c:Loud Cat) : String = "Meow!"

getAnimal() : Animal = Loud Cat
animal = getAnimal() // 1
loudAnimal = Loud animal // 2

speak(animal)
speak(loudAnimal)

At line // 1, we get some sort of Animal. We don't know what kind, but we don't need to because of polymorphism. As it happens, it's a Loud Cat, which means it'll meow at us when we ask it to speak.

At line // 2, we take our animal and make it loud. The interesting thing to note here is that the animal already was loud, though the compiler doesn't know it: specifically, it was a Loud Cat. But, since the compiler doesn't know that, it creates a new object, with new bindings. At that point, the only thing the compiler knows is that it's a Loud Animal, so it binds it to the stomping version of speak.

This behavior is a bit counterintuitive at first. By doing what most systems would call a no-op (I even said it would be, last month!), an object's behavior changes. Specifically, adding the Loud trait to an object that already had it ended up changing its function binding.

This may seem weird, but I'd like to give it a try. My ~~hypothesis~~ hope is that the tradeoff of more deterministic object creation is worth it.

Incidentally, this rebinding would not happen if the operation was known to be a no-op at compile time:

getLoudAnimal() : Loud Animal = Loud Cat
animal = getLoudAnimal()
loudAnimal = Loud animal

Here, animal is already Loud, so adding Loud to it becomes a compile-time no-op, with no rebinding. While this particular example is silly, this use case is actually useful if the redundant trait has parameters that you'd like to change:

data Microchipped(id : Int)
data Cat

getAnimal() : Microchipped Animal =
  Microchipped(123) Cat

animal1 = getAnimal()
animal2 = Microchipped(456) animal1

Here, we're changing the animal's microchip id without having to worry about upcasting its function resolution.