Yuval Shavit: Semi-mutability in Project Effes

Mutable data can lead to confusing code, but it's also useful when it lets developers think in simpler, more convenient patterns — like for loops instead of recursion. How would I apply this to my pipe dream of a language, Effes?

The first thing is to identify the areas where I think mutability would be useful. Multithreading is an obvious pick, but I'll put that aside for now; I'm not sure yet what sort of multithreading model I want to have in my language, if any. Some others are for loops, streams (especially input streams, since output streams can appear to have no state to the program) and building up data structures which would be inefficient to implement without mutable data (hash maps, for instance).

In all these cases, I'll be relying on my conclusion from the previous post (link above), which is that the problem is not mutable data, but data which is mutated outside the current scope.

Loops

I've written some toy programs in Haskell, and I'm getting the hang of always using tail recursion and folds. But I've got to admit, a lot of times I think in terms of conventional loops. Sure, I can translate those loops to their functional equivalents, but why should I? Even Haskell makes some grudging concessions with its do notation, which sugars purely functional concepts into a more palatable, imperative style.

The solution here is simple: local variables can always be reassigned, and Effes will support the usual control flow statements: for, while, etc. I'll probably have lambdas, in which case they'll close over the variable as it was at the lambda's creation — the variable within the lambda will not update if the local variable changes, as that would allow cross-scope mutable data.

Input streams and mutable data structures

Input streams are an interesting beast. They have to be mutable in some sense, because it's pretty useless to open a file handle and then keep reading the first byte from it.

Haskell addresses this problem via its laziness. I won't get into the details, but Haskell's laziness has its own issues, and I don't want to go down that path. I'm leaning toward a different solution: you can have mutable state, but you can only use it in limited ways:

You can always return mutable data, in which case it will be mutable for the call site that uses it
You can pass mutable data to a function call, but at that point you cannot use that data anymore in the calling function
You can store mutable data in a data structure, including a lambda, but that structure is then also considered mutable; and the original reference is now illegal to use, as in the previous point.

The first two factors combine to prevent data from being changed from under you, while still permitting things like input streams. You would get an input stream, which is mutable, from some source; read read read; and then return pass it to your call site. But no method you call will ever read bytes that you expected it not to consume; for it to do that, you would have to pass it to a method, but at that point you're not allowed to use it anymore.

If you do want to continue using mutable data after you pass to it a method, the method you call will have to return it back to you. This acts as a big, unavoidable warning post: what I'm getting back is not under my control, so I'd better read the function's docs to see what its return value is.

This lets us do lots of useful things. You can get mutable input streams; you can wrap them into filtering/transforming input streams that you then use instead; you can build hash maps efficiently; and you don't have to worry that code outside of the few lines in the current method will have done something you don't expect. Alternatively, you don't have to rely on some code far away "fixing up" your data structure at some point in the maybe-future, because they're not allowed to.

All together now

Here are some examples. First the basics: data is immutable and can be used without restriction — except, of course, that you can't modify it. References can be re-assigned as you please:

immutableFoo = [1, 2, 3] # immutable
immutableFoo = [7, 8, 9]
immutableFoo.append(7) # Error!
# Can't modify immutable data.
someMethod(immutableFoo) # okay
anotherMethod(immutableFoo) # okay

Mutable data is only slightly more involved. Once you pass a mutable structure to a method or use it to create a lambda (or other data structure), you can't use it directly anymore.

mutableBar = mutable [4, 5, 6]
for (i = 0; i < 5; ++i) {
  mutableBar.append(i)
}
someMethod(mutableBar)
anotherMethod(mutableBar) # Error!
# mutableBar was passed to a method
# and is now invalid in this scope
 
mutableBar = mutable [1, 2, 3] # Re-assign the var
lamb = () -> mutableBar.append(4)
someMethod(mutableBar) # Error!
# mutableBar was used to construct a lambda
# and is now invalid in this scope

Mutability is contagious. Any structure that contains mutable data is considered mutable itself, even if that mutability can't be directly accessed.

lamb = () -> mutableBar.append(4)
lamb()
yetAnotherMethod(lamb)
lamb() # Error!
# lamb was constructed with mutable data and is
# thus mutable. It therefore became invalid when
# we passed it to yetAnotherMethod

The solution is to return back a mutable structure. Here we pass a mutable array to niceMethod, at which point we can't use it; but we get back a different (as far as we know) mutable list, which we then reassign back to mutableBar.

mutableBar = mutable [1, 2, 3]
mutableBar = niceMethod(mutableBar)
mutableBar.append(4)

Yuval Shavit

Wednesday, May 29, 2013

Semi-mutability in Project Effes

Loops

Input streams and mutable data structures

All together now

No comments:

Post a Comment