Friday, June 28, 2013

When is a new object really a new object?

Since I just brought up object construction, it's worth discussing when a new object is really created, as opposed to when a previous object can be reused. This will also anticipate some of the function resolution requirements, which I'll introduce in a few posts. As usual, I'll lead by example.

First, some data types to work with:

data Red; Green; Blue nicknamed Color
data Colorful(color : Color)
data Box(size : Int)
data Fragile

Hopefully this should be getting familiar by now. We start with three stateless traits, the union of which is nicknamed Color. We then introduce two stateful traits, followed by a third trait to mark fragility. Now let's create some objects:

a = Colorful Box where
    color = Red
    size = 27
b = Fragile b
b2 : Box = b
c = Fragile b
c2 = Fragile b2
d = Colorful c where
    color = Blue

The first assignment just creates a composite object, as I described in the last post. Easy.

The second assignment (the one to b) creates a new object. It has to, because (as I'll elaborate in the upcoming function resolution post) the object's runtime type is maintained even if the object is "upcasted" later, as happens with b2. We need some way of knowing that b is Fragile while a is not, and the easiest (onlyest?) way to do that is by creating a new object.

But Fragile has no state, which means all Fragile objects are identical. In fact, internally, Fragile could be just a metadata marker on the object. That means that when we re-add the Fragile trait in c = Fragile b, we can actually reuse the same b object. This optimization can be done at compile time.

The c2 assignment is also a no-op, but only at runtime. Even though the compile-time type of b2 doesn't include Fragile, at runtime we can see that the b2 object is already Fragile, and so we simply return it.

Contrast that to the d assignment, which also re-sets the state of one of the component traits. In this case, we do need to create a new object in the general case: objects are immutable, so we can't change the state of c, but we definitely need to store the fact that we now have a blue box where we used to have a red one. (If the new color happens to be the same as the old one, we reuse the object in principle; I'm not sure if this check would be cheaper than unconditionally creating a new object.)

Implicit in all of the above is that there is not a way to check for referential equality — that is, that the programmer won't ever care (or know) if the runtime reuses an object. I think this is a good idea even without these optimizations, so I'm going to throw that into the Effes "spec." Truth be told, I've been assuming it all along.

Incidentally, since the compile-time type of c is Fragile Colorful Box, we could have just written d = Colorful c (without the where clause). This would say that we want to change none of the fields, in which case the whole thing is a no-op at compile-time (as c was). If Colorful had more than one field (maybe it has an alpha : Float), we could have used this syntax to change only one of the fields. This would obviously still create a new object.

On the other hand, if we'd written d2 = Colorful c2, then we'd have to provide the field values. c2 has a compile-time type of Fragile Box, and the compiler would require that if we add color to this box, we need to specify which color. The fact that c2 already has a color at runtime is irrelevant; the compiler will require that the program specific values for all Color fields (in this case, just the one), and the runtime will create a new object that overwrites these fields.

No comments:

Post a Comment