Sunday, January 12, 2014

A solution to the object-equality problem

As I mentioned in my last post, there are certain methods that require transitivity — and that means both objects must agree on a single implementation of those methods. For instance, if a `Ball` expects to compare itself against another `Ball` using only its radius, a `WeightedBall` cannot use radius and weight, even if it's being compared to another `WeightedBall`. If it did, transitivity could break when a plain, unweighted `Ball` joins the party.

My solution in Effes is to define `compiletime` methods, which look like normal instance methods but are in fact similar to `Comparator` objects whose instances are picked at compile time. This is basically an implementation of the strategy pattern.

To see this in action, consider the following snippet:

``````type Eq:
[compiletime] == (other: This) -> Bool

is Eq
[override compiletime] (== other):

type WeightedBall:
is Ball -- also inherits Eq
weight -> Int:
[override compiletime] (== other):

[override compiletime] (== other): (this: Ball) == other``````

First we define an `Eq` type, with a `compiletime` method named `==`. This method takes one argument, which must be of the same type as the declaring class (the `This` class is a reserved class name, just as `this` is a reserved variable name in many OO languages, including Effes). The `This` restriction is re-applied in each specific type, so `Ball ==` must take a `Ball` argument (not just a `Eq`).

The `WeightedBall` subtype of `Ball` redefines `==` to include the two balls' weights. But it also defines a "variant" called `ByRadius`, which only compares radii. It does this by casting itself down to a plain `Ball`, and then calling `==`, such that this call to `==` has a most specific implementation of `(Ball ==)`.

To someone trying to compare two objects, the `==` method looks like a normal method call. But under the hood, Effes delegates it to a strategy object. Here's a simple set implementation which maintains a list of elements and just checks each new element as it comes in, to make sure it's not already in the list:

``````type ListSet[T:Eq]:
val values : List[T] = Empty
add elem:T -> ListSet[T]: if values contains T
then this
else values' = values push elem
contains (needle : T): case of:
Nothing: False
needle: True
_: tail contains elem``````

The `==` magic happens in the last three lines. The `case head of` statement tries to match `head` against `needle`, using `==`. If they match, the answer is `True`; otherwise, we try recursively on the tail list.

As you can see, `contains` takes one argument, the `needle` we're looking for. But under the hood, it actually takes two arguments: the `needle`, and an "equality strategy" that holds some implementation of `==`. This object is picked at compile time, ensuring that runtime polymorphism doesn't screw up transitivity.

For instance, let's try to add a `heavyBall : WeightedBall` to two `ListSet`s, one typed to `Ball` and the other to `WeightedBall`:

``````plainSet : ListSet[Ball] = getSomeListSet
weightedSet : ListSet[WeightedBall] = getSomeListSet

Note that both variables are initially assigned to identical sets, meaning that both have all `WeightedBall` elements. But, `plainSet` is typed to `ListSet[Ball]`. The last two lines are internally translated to something like:

``````s1_ = plainSet add heavyBall (Ball ==)
s2_ = weightedSet add heavyBall (WeightedBall ==)``````

At the `==` call site (in `ListSet contains`), the `case head of ... needle` snippet is translated to something like `if (_equalityTester head needle)`, where `_equalityTester` is the strategy object.

If we wanted the set to contain `WeightedBalls`, but only restrict to unique radii, we would write:

``````uniqueRadii : ListSet[WeightedBall:ByRadius] = getSomeListSet

The `WeightedBall:ByRadius` bit tells the compiler to pick the `ByRadius` variant we defined above:

``s3_ = uniqueRadii add heavyBall (WeightedBall:ByRadius ==)``

There's one important bit to note here. Comparing two objects by radius is going to give you a different result than comparing them by radius and weight; there could be more duplicates. So when we cast the result of `getSomeListSet` to `ListSet[Ball]`, that's not the relatively light operation that a cast usually is. We have to actually rebuild the `ListSet` so that it knocks out those extra duplicates. I'm still working on the mechanism for this; maybe the casting is only allowed if the type defines a method for it, something like:

``````type ListSet[T:Eq]:

The signatures of these `compiletime` methods are a bit deceitful. They look like instance methods, and even define `this` but they're actually just normal two-arg methods. This is of course the same for all instance methods (`this` is always just an implicit argument), but it feels like more of a lie here. I think it's because the `this` argument isn't even semantically special; the use case for `compiletime` methods is precisely that `this` and `other` are on the same footing!

The reason I went for the cheat is simply that it feels more natural to me, even if it's a bit inaccurate. It lets us write things like `a == b` (instead of `== a b`, as would be the case for a non-instance method) without tricky rules for declaring methods as inline. My goal from the beginning was to try to get the best of both worlds that Java offers: the correctness of `Comparators` with the natural feel of `this.equals`. I think this approach succeeds in that goal, at the cost of pretty minimal sugar/lies.

Thursday, January 2, 2014

Equalities in object-oriented programs

There's a problem in object-oriented programming: equality and polymorphism don't generally get along. Specifically, extra state introduced in subclasses breaks important properties of equality like transitivity. I have an idea for a solution in Effes which, though I'm sure isn't novel, I haven't seen in other languages. But first, I'd like to take a post to explore some aspects of the problem.

Consider two classes: `Ball` and `WeightedBall extends Ball`. A `Ball` has one piece of state, `radius`; two objects are equal if they're both balls with the same radius. A `WeightedBall` adds another piece of state, `weight`. How does this factor into equality?

One answer could be that `WeightedBall` checks to see if the other object is a `WeightedBall` or just a plain `Ball`. If it's a `WeightedBall`, look at both the weight and the radius; otherwise, just look at the radius. In pseudocode:

``````boolean isEqual(Object other):
if other is a WeightedBall:
&& this.weight == other.weight
else if other is a Ball:
else:
return false``````

This approach fails because it doesn't satisfy transitivity. Consider three balls and some equalities:

• Given:
• `ball5 = Ball(radius=5)`
• `lightBall5 = WeightedBall(radius=5, weight=10)`
• `heavyBall5 = WeightedBall(radius=5, weight=200)`
• Then:
• ball5 == lightBall5 (only check radius)
• ball5 == heavyBall5 (only check radius)
• lightBall5 != heavyBall5 (check radius and weight)

In a language like Java, there aren't any great solutions to this. Either you require that both objects have exactly the same class, so that a `Ball` is never equal to a `WeightedBall`; or you don't let subclasses add state to the equality check, so that `lightBall5` and `heavyBall5` are equal because they only check radius, as specified by the `Ball.equals` contract.

This problem occurs with ordering, too. Java handles this variant by letting you create explicit `Comparator` classes, so that a `Comparator<? super Ball>` only checks radius while a `Comparator<? super WeightedBall>` also checks weight. This also lets you define several orderings, so that you can sort balls by radius, weight, bounciness, or any other property.

The `Comparator` pattern is really the right one. It lets the programmer pick the right ordering for each task, and it makes reflexivity and transitivity easy. The problem is that it's verbose: every method that cares about `T`'s ordering needs to take a `Comparator<? super T>`, and every class that defines an ordering needs to create a nested class, preferably as a singleton.

Java addressed this by creating a `Comparable` interface for providing a "natural" ordering; basically, a class can be its own comparator. That's helpful for the programmer who's writing a comparable class, but it inflates APIs that use ordering. Each one needs two variants: one that takes a `T extends Comparable<? super T>` and another that takes a `T` and a `Comparator<? super T>`. Here are two methods from `java.util.Collections`, for instance:

``````public static <T extends Comparable<? super T>>
void sort(List<T> list)
public static <T>
void sort(List<T> list, Comparator<? super T> c)``````

The `Comparator` pattern is a step in the right direction, but it does nothing for equality, nor for its close cousin, hashing. If you create a `HashSet<WeightedBall>`, it can only use the default definition of `equals` and `hashCode` that `WeightedBall` defines — and which must still be consistent with `Ball`'s `equals` and `hashCode`. I suppose you could create a `Hasher` interface similar to `Comparator`, but nobody does — the JDK certainly doesn't. Your only option is to wrap each `WeightedBall` in a `WeightedBallByWeight` object that defines equality/hash.

This hybrid approach — `Comparable` and `equals`/`hashCode` on the object, `Comparator` as a separate object — also leads to inconsistencies between ordering and equality. The JavaDocs for `TreeSet` includes this bit:

Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the `Set` interface.

This means it's virtually impossible to use a `Comparator` correctly in some cases. If `WeightedBall` above went with the exact-class approach (so that it can check both `radius` and `weight`), you can't create a `TreeSet<WeightedBall>` with a custom comparator that only checks radius; if you did, you could have two balls that are not equal (same radius, different weights) but can't both be in the set, because the custom ordering says they are equal.

Similarly, if you decided to let `Ball` define the equality interface, so that `WeightedBall` doesn't check `weight`, then you can't create a `TreeSet<WeightedBall>` with a `Comparator` that also factors in weight. If you did, you could have two objects that are both in the set, and yet are `equals` to one another!

What we really need is a hierarchy of `Comparator`-like interfaces:

• `Equalitator<T>` with `boolean areEqual(T a0, T a1)`
• `Hasher<T> extends Equalitator<T>` with `int hashCode(T a0)`
• `Comparator<T> extends Equalitor<T>` with `int compare(T a0, T a1)` and a restriction that `areEqual(a, b) == (compare(a, b) == 0)`

As I teased above, I have an idea for how to solve this in Effes. Stay tuned!