Yuval Shavit: mutability

Showing posts with label mutability. Show all posts

Friday, July 26, 2013

Another blow against strict immutability

I know I've talked about mutability before, but I realized the other day yet another argument against taking immutability too far: it complicates logging.

I'll take Haskell as my usual favorite example, but this time I won't put it in a favorable light. Because it treats all I/O as non-pure, mutable behavior, any logging you do has to be within the IO monad. This means you have to change your application's structure to support logging. You have to do so within a monad, and, if that monad doesn't happen to be IO, it needs to be a transformer monad. If it's not, and you want logging — find a new monad.

(Full disclosure: I haven't written any real apps in Haskell, so I haven't needed logging past what the repl shell gives me. Maybe all of this is not a problem in practice.)

You can cheat a bit: there's a debug module that lets you log from pure code, but it's not meant for production. For the most part, pure code just can't be logged.

To me, logging is critical. It's important both for "interactive" debugging (as I'm writing new code) and for figuring out what went wrong when a bug crops up in the field. At the same time, it's orthogonal to the logic of the application, and so should dictate as little as possible about how the application is set up. If I have to turn a pure function into an impure one just so I can log some values, that seems like the tail wagging the dog.

Even though Effes is restrictive about mutable types, it considers loggers immutable because their actions (namely, printing bytes to a file) can't directly change the behavior of Effes objects. This means you can use a logger anywhere you like, which is good.

Tuesday, July 2, 2013

Immutability and randomness

In my last post, I proposed a working definition for mutability: in short, an object mutable if code can directly affect its state. I mentioned one odd implication of this rule: certain OS calls like getCurrentTime are considered immutable even though their results are different each time. An even trickier problem is that of randomness and pseudo-randomness.

Let's take a random number generator first. There are various ways to produce truly random (or very close to it) number sequences: using the weather, relying on hardware inputs, or even leaning on quantum mechanics. A language can't affect any of these, so random number generators that use them fall within the realm of immutable objects by my definition.

On the other hand, a pseudo-random number generator's sequence is completely based on code, once its initial state is set. The PRNG has some seed, and when asked it does some math to determine a new seed and a result for the random. This means that PRNGs are "normal" objects that have to participate in a language's mutability rules.

For languages that have immutable objects, this makes PRNGs awkward to work with; rather than just asking for the next random number, you have to ask for a pair (randomNumber, nextPrng). If you forget to do this, your pseudo-random sequence will be boringly predictable. Here's an example from Haskell (using ghci):

> prng <- getStdGen
> next prng
(750537749,750578441 40692)
> next prng
(750537749,750578441 40692)
> next prng
(750537749,750578441 40692)

As you can see, this is quite boring; the reason is that the second element of the returned pair (shown here as "750578441 40692") is the crucial next-immutable-PRNG, which we're not using. Here's that code done right:

> prng0 <- getStdGen
> let (r1, prng1) = next prng0 :: (Int, StdGen)
> let (r2, prng2) = next prng1 :: (Int, StdGen)
> let (r3, prng3) = next prng2 :: (Int, StdGen)
> r1
750537749
> r2
1579754317
> r3
1580611703

This is the behavior we want, but look how much more code it took! Even ignoring the ":: (Int, StdGen)" bits (that's Haskell needing some help with its type inference), it took twice the lines of code and required littering the block with one-time-use prngN variables.

This is a case where I think Effes' semi-mutability really shines. Once you get the PRNG object, you can use it all you want, and it'll mutate its internal state to update its seed as it goes. If you pass it to a function, that invalidates (at compile time) your copy; you have to get a different one if you want to continue generating random numbers. This is a tad more cumbersome, but not much (there are ways to alleviate it, such as "forking" the PRNG before passing it to the function).

Monday, July 1, 2013

Immutability isn't an exact science with these clowns

The last few posts have read like a reference manual more than a blog, so I'm going to switch things up a bit and write about immutability at a high level. This post will have an Effes-based perspective, but it won't tie in directly to the language.

The question of the day is: what makes an object immutable? The easiest and (almost) most restrictive definition is that nothing about the object can change — its bytes are determined at construction time and never altered afterwards.

Well, what about an object that lazily caches the result to some method — a lazy-evaluated asString field for the object's toString representation? The first invocation of toString will modify the object's bytes, but I would argue that it's still immutable because its observable behavior has not. This is my version of the equivalence principle: if you can't tell from your rocket ship that the universe has changed, then it hasn't in any real sense.

Let me take it a step further: is stdout mutable? Let's assume that a write to stdout never fails, that the stream is never closed, and that the stdout object doesn't contain state such as "how many bytes have been written so far." In this case, from the language's perspective, the stdout object has no state, which is just about the most immutable a thing can be.

On the other hand, most output streams are mutable, because they can be closed. Streams can also fail due to things like a full disk or dropped connection, but I see those as two different scenarios: in one, the behavior of the object changed due directly to an action the code took, whereas in the other, the behavior changed due to an external event.

This is an important distinction, because otherwise even the original, "most restrictive" definition above is not enough. Let's say we want to define immutability strictly as "the object's behavior never changes," so that an uncloseable output stream is mutable (because it may fail, thus altering its behavior). Now consider an object with no pre-calculated string representation; its toString method generates a new string each time. This requires allocating memory, which itself may fail — so with our super-strict definition of immutability, even this object isn't immutable. With this definition, the only immutable objects are one with immutable, public fields and nothing else: even a getter could fail due to a stack overflow.

And so, I submit this as a working definition of mutability, at least for Effes: an object is immutable if its behavior can't change in a deterministic way as a direct result of actions expressible in the language.

This definition has an interesting implication: a method that gets the current date/time from the OS is immutable, even though it gives a different result each time! (Let's assume the language doesn't have features built into it that set the system clock.) As weird as this sounds, it actually makes sense, given the problems that I think immutability solves — specifically, no scope can change the date/time for another scope. You won't ever invoke getCurrentTime expecting it to have been set by another scope even though it wasn't; and you won't ever invoke getCurrentTime from one scope expecting it not to have been changed, even though somebody else changed it.

Wednesday, May 29, 2013

Semi-mutability in Project Effes

Mutable data can lead to confusing code, but it's also useful when it lets developers think in simpler, more convenient patterns — like for loops instead of recursion. How would I apply this to my pipe dream of a language, Effes?

The first thing is to identify the areas where I think mutability would be useful. Multithreading is an obvious pick, but I'll put that aside for now; I'm not sure yet what sort of multithreading model I want to have in my language, if any. Some others are for loops, streams (especially input streams, since output streams can appear to have no state to the program) and building up data structures which would be inefficient to implement without mutable data (hash maps, for instance).

In all these cases, I'll be relying on my conclusion from the previous post (link above), which is that the problem is not mutable data, but data which is mutated outside the current scope.

Loops

I've written some toy programs in Haskell, and I'm getting the hang of always using tail recursion and folds. But I've got to admit, a lot of times I think in terms of conventional loops. Sure, I can translate those loops to their functional equivalents, but why should I? Even Haskell makes some grudging concessions with its do notation, which sugars purely functional concepts into a more palatable, imperative style.

The solution here is simple: local variables can always be reassigned, and Effes will support the usual control flow statements: for, while, etc. I'll probably have lambdas, in which case they'll close over the variable as it was at the lambda's creation — the variable within the lambda will not update if the local variable changes, as that would allow cross-scope mutable data.

Input streams and mutable data structures

Input streams are an interesting beast. They have to be mutable in some sense, because it's pretty useless to open a file handle and then keep reading the first byte from it.

Haskell addresses this problem via its laziness. I won't get into the details, but Haskell's laziness has its own issues, and I don't want to go down that path. I'm leaning toward a different solution: you can have mutable state, but you can only use it in limited ways:

You can always return mutable data, in which case it will be mutable for the call site that uses it
You can pass mutable data to a function call, but at that point you cannot use that data anymore in the calling function
You can store mutable data in a data structure, including a lambda, but that structure is then also considered mutable; and the original reference is now illegal to use, as in the previous point.

The first two factors combine to prevent data from being changed from under you, while still permitting things like input streams. You would get an input stream, which is mutable, from some source; read read read; and then return pass it to your call site. But no method you call will ever read bytes that you expected it not to consume; for it to do that, you would have to pass it to a method, but at that point you're not allowed to use it anymore.

If you do want to continue using mutable data after you pass to it a method, the method you call will have to return it back to you. This acts as a big, unavoidable warning post: what I'm getting back is not under my control, so I'd better read the function's docs to see what its return value is.

This lets us do lots of useful things. You can get mutable input streams; you can wrap them into filtering/transforming input streams that you then use instead; you can build hash maps efficiently; and you don't have to worry that code outside of the few lines in the current method will have done something you don't expect. Alternatively, you don't have to rely on some code far away "fixing up" your data structure at some point in the maybe-future, because they're not allowed to.

All together now

Here are some examples. First the basics: data is immutable and can be used without restriction — except, of course, that you can't modify it. References can be re-assigned as you please:

immutableFoo = [1, 2, 3] # immutable
immutableFoo = [7, 8, 9]
immutableFoo.append(7) # Error!
# Can't modify immutable data.
someMethod(immutableFoo) # okay
anotherMethod(immutableFoo) # okay

Mutable data is only slightly more involved. Once you pass a mutable structure to a method or use it to create a lambda (or other data structure), you can't use it directly anymore.

mutableBar = mutable [4, 5, 6]
for (i = 0; i < 5; ++i) {
  mutableBar.append(i)
}
someMethod(mutableBar)
anotherMethod(mutableBar) # Error!
# mutableBar was passed to a method
# and is now invalid in this scope
 
mutableBar = mutable [1, 2, 3] # Re-assign the var
lamb = () -> mutableBar.append(4)
someMethod(mutableBar) # Error!
# mutableBar was used to construct a lambda
# and is now invalid in this scope

Mutability is contagious. Any structure that contains mutable data is considered mutable itself, even if that mutability can't be directly accessed.

lamb = () -> mutableBar.append(4)
lamb()
yetAnotherMethod(lamb)
lamb() # Error!
# lamb was constructed with mutable data and is
# thus mutable. It therefore became invalid when
# we passed it to yetAnotherMethod

The solution is to return back a mutable structure. Here we pass a mutable array to niceMethod, at which point we can't use it; but we get back a different (as far as we know) mutable list, which we then reassign back to mutableBar.

mutableBar = mutable [1, 2, 3]
mutableBar = niceMethod(mutableBar)
mutableBar.append(4)

Tuesday, May 28, 2013

The problem with mutable types

Generally speaking, languages focus mostly on either mutable or immutable types. Of course, it's not quite as clear-cut as that n practice; the usual caveats about broad brushes apply. For instance, Haskell usually describes itself as having immutable types (at least in the intro-ish books), but its MVar is mutable; C and C++ have generally mutable types, but const can give you immutability.

I wasn't sure at first whether I'd want my new language to have mutable types. Even though I want my language to be largely functional (or at least inspired by functional programming), there are times when imperative, mutable-based thinking is more convenient -- or at least more natural to those of us who don't use functional languages every day. More to the point, I found myself wondering: what's really the problem with mutability?

The following Java for loop relies on mutability, but I don't think any programmer could really argue it's hard to follow or especially bug prone:

List<Long> times = new List<>();
for (int i = 0; i < 5; ++i) {
  times.add(System.nanoTime());
}

Put aside the fact that this code isn't very interesting; my point is that it's not very difficult. You can see exactly what it does and what effects it'll have on other code (namely, none). Contrast that to this almost identical example:

List<Long> times = this.getList();
for (int i = 0; i < 5; ++i) {
  times.add(System.nanoTime());
}

It's entirely unclear what this code will do. Will it throw an UnsupportedOperationException from add? Will it succeed, but in doing so break an invariant that another user of the list depends on? Will it modify a list that isn't threadsafe but had been (safely) published to another thread, in the expectation that it wouldn't be modified? Yikes. The problem is not mutability, it's mutable state that exists outside of the current scope. When we created the list in the first method, it was wicked easy to reason about changing it. It's when we got the list from somewhere else that things got complicated. If we had saved the reference to times in the first example to a field in the class, we'd again have to worry that somebody else will change it.

What it amounts to is that mutable state can pull the rug out from under you. I think I have an idea of how to address that without throwing the baby out with the bathwater; stay tuned for a followup post!