## Thursday, January 2, 2014

### Equalities in object-oriented programs

There's a problem in object-oriented programming: equality and polymorphism don't generally get along. Specifically, extra state introduced in subclasses breaks important properties of equality like transitivity. I have an idea for a solution in Effes which, though I'm sure isn't novel, I haven't seen in other languages. But first, I'd like to take a post to explore some aspects of the problem.

Consider two classes: `Ball` and `WeightedBall extends Ball`. A `Ball` has one piece of state, `radius`; two objects are equal if they're both balls with the same radius. A `WeightedBall` adds another piece of state, `weight`. How does this factor into equality?

One answer could be that `WeightedBall` checks to see if the other object is a `WeightedBall` or just a plain `Ball`. If it's a `WeightedBall`, look at both the weight and the radius; otherwise, just look at the radius. In pseudocode:

``````boolean isEqual(Object other):
if other is a WeightedBall:
&& this.weight == other.weight
else if other is a Ball:
else:
return false``````

This approach fails because it doesn't satisfy transitivity. Consider three balls and some equalities:

• Given:
• `ball5 = Ball(radius=5)`
• `lightBall5 = WeightedBall(radius=5, weight=10)`
• `heavyBall5 = WeightedBall(radius=5, weight=200)`
• Then:
• ball5 == lightBall5 (only check radius)
• ball5 == heavyBall5 (only check radius)
• lightBall5 != heavyBall5 (check radius and weight)

In a language like Java, there aren't any great solutions to this. Either you require that both objects have exactly the same class, so that a `Ball` is never equal to a `WeightedBall`; or you don't let subclasses add state to the equality check, so that `lightBall5` and `heavyBall5` are equal because they only check radius, as specified by the `Ball.equals` contract.

This problem occurs with ordering, too. Java handles this variant by letting you create explicit `Comparator` classes, so that a `Comparator<? super Ball>` only checks radius while a `Comparator<? super WeightedBall>` also checks weight. This also lets you define several orderings, so that you can sort balls by radius, weight, bounciness, or any other property.

The `Comparator` pattern is really the right one. It lets the programmer pick the right ordering for each task, and it makes reflexivity and transitivity easy. The problem is that it's verbose: every method that cares about `T`'s ordering needs to take a `Comparator<? super T>`, and every class that defines an ordering needs to create a nested class, preferably as a singleton.

Java addressed this by creating a `Comparable` interface for providing a "natural" ordering; basically, a class can be its own comparator. That's helpful for the programmer who's writing a comparable class, but it inflates APIs that use ordering. Each one needs two variants: one that takes a `T extends Comparable<? super T>` and another that takes a `T` and a `Comparator<? super T>`. Here are two methods from `java.util.Collections`, for instance:

``````public static <T extends Comparable<? super T>>
void sort(List<T> list)
public static <T>
void sort(List<T> list, Comparator<? super T> c)``````

The `Comparator` pattern is a step in the right direction, but it does nothing for equality, nor for its close cousin, hashing. If you create a `HashSet<WeightedBall>`, it can only use the default definition of `equals` and `hashCode` that `WeightedBall` defines — and which must still be consistent with `Ball`'s `equals` and `hashCode`. I suppose you could create a `Hasher` interface similar to `Comparator`, but nobody does — the JDK certainly doesn't. Your only option is to wrap each `WeightedBall` in a `WeightedBallByWeight` object that defines equality/hash.

This hybrid approach — `Comparable` and `equals`/`hashCode` on the object, `Comparator` as a separate object — also leads to inconsistencies between ordering and equality. The JavaDocs for `TreeSet` includes this bit:

Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the `Set` interface.

This means it's virtually impossible to use a `Comparator` correctly in some cases. If `WeightedBall` above went with the exact-class approach (so that it can check both `radius` and `weight`), you can't create a `TreeSet<WeightedBall>` with a custom comparator that only checks radius; if you did, you could have two balls that are not equal (same radius, different weights) but can't both be in the set, because the custom ordering says they are equal.

Similarly, if you decided to let `Ball` define the equality interface, so that `WeightedBall` doesn't check `weight`, then you can't create a `TreeSet<WeightedBall>` with a `Comparator` that also factors in weight. If you did, you could have two objects that are both in the set, and yet are `equals` to one another!

What we really need is a hierarchy of `Comparator`-like interfaces:

• `Equalitator<T>` with `boolean areEqual(T a0, T a1)`
• `Hasher<T> extends Equalitator<T>` with `int hashCode(T a0)`
• `Comparator<T> extends Equalitor<T>` with `int compare(T a0, T a1)` and a restriction that `areEqual(a, b) == (compare(a, b) == 0)`

As I teased above, I have an idea for how to solve this in Effes. Stay tuned!