Thursday, May 30, 2013

Weak typing is weak

I spent a couple hours the other day digging through layers of some unfamiliar code to track down a bug that ended up being a really simple, one-line fix. That's nothing spectacular, but this bug had to do with weak typing, which gives me a good opportunity to rant against that. Here was the fix, with some variable names changed to protect the innocent:

--- foo = bar[key] or null +++ foo = if bar[key]? then bar[key] else null

The idea was to get bar[key] if there is such an entry, or else default to null. I recognize that pattern because I've seen it before, and I've written similar code. It's a common pattern; the CoffeeScript page even uses it in an example. (In case you're not familiar with CoffeeScript/JavaScript, the original line works because a or b in CoffeeScript means "a if a is a true value, otherwise b." It's used as a quick get-or-default pattern.)

The problem arises if bar[key] is a defined, useful value that happens to also be a false value — like an integer 0, an empty string, or (to state the obvious), the boolean false. In that case, since the first expression is a false value, the whole expression evaluates to the second expression, null; the original line translated a 0 into a null.

In other words, the problem is that the first expression in a or b doesn't have to be a boolean, it just has to be interpretable as a boolean — and it just so happens that any type in JavaScript/CoffeeScript can be interpreted as a boolean, meaning that it's really easy to use the a-or-b-as-defaulter pattern, which happens to be wrong in many cases. And come to think of it, why are we using the same construct for boolean logic and getter-with-default, which are two completely different concepts?

Here's my issue with that: it's always valid code, and sometimes it's a valid pattern, but the language doesn't tell you when. And that, in a nutshell, is my beef with weak typing.

Now, there are lots of ways to write bugs in code, and it's reasonable to ask why I'm singling out this one class of bugs. I would argue that whereas other bugs are due to misrepresenting a problem, or to applying the wrong solution to a problem, this bug is due to applying the wrong dialect  of the correct solution. Ugh. Debugging is enough of a pain without the compiler encouraging us to write the programming equivalent of a tongue-twister. Weak typing makes it easy to write bugs, and, once you've written them, the type system does absolutely nothing to help you find them.

When it comes to programming, I want the computer to mean what I say. The quid pro quo is that I need to say what I mean. If what I mean is "bar[key] if it exists, otherwise null," then the language should require that I just say that. It's well and good for a language to make that pattern succinct, but blurring the lines between boolean logic and plain-old-values is funky reasoning that can lead to funky bugs.

Wednesday, May 29, 2013

Semi-mutability in Project Effes

Mutable data can lead to confusing code, but it's also useful when it lets developers think in simpler, more convenient patterns — like for loops instead of recursion. How would I apply this to my pipe dream of a language, Effes?

The first thing is to identify the areas where I think mutability would be useful. Multithreading is an obvious pick, but I'll put that aside for now; I'm not sure yet what sort of multithreading model I want to have in my language, if any. Some others are for loops, streams (especially input streams, since output streams can appear to have no state to the program) and building up data structures which would be inefficient to implement without mutable data (hash maps, for instance).

In all these cases, I'll be relying on my conclusion from the previous post (link above), which is that the problem is not mutable data, but data which is mutated outside the current scope.

Loops

I've written some toy programs in Haskell, and I'm getting the hang of always using tail recursion and folds. But I've got to admit, a lot of times I think in terms of conventional loops. Sure, I can translate those loops to their functional equivalents, but why should I? Even Haskell makes some grudging concessions with its do notation, which sugars purely functional concepts into a more palatable, imperative style.

The solution here is simple: local variables can always be reassigned, and Effes will support the usual control flow statements: for, while, etc. I'll probably have lambdas, in which case they'll close over the variable as it was at the lambda's creation — the variable within the lambda will not update if the local variable changes, as that would allow cross-scope mutable data.

Input streams and mutable data structures

Input streams are an interesting beast. They have to be mutable in some sense, because it's pretty useless to open a file handle and then keep reading the first byte from it.

Haskell addresses this problem via its laziness. I won't get into the details, but Haskell's laziness has its own issues, and I don't want to go down that path. I'm leaning toward a different solution: you can have mutable state, but you can only use it in limited ways:
  • You can always return mutable data, in which case it will be mutable for the call site that uses it
  • You can pass mutable data to a function call, but at that point you cannot use that data anymore in the calling function
  • You can store mutable data in a data structure, including a lambda, but that structure is then also considered mutable; and the original reference is now illegal to use, as in the previous point.

The first two factors combine to prevent data from being changed from under you, while still permitting things like input streams. You would get an input stream, which is mutable, from some source; read read read; and then return pass it to your call site. But no method you call will ever read bytes that you expected it not to consume; for it to do that, you would have to pass it to a method, but at that point you're not allowed to use it anymore.

If you do want to continue using mutable data after you pass to it a method, the method you call will have to return it back to you. This acts as a big, unavoidable warning post: what I'm getting back is not under my control, so I'd better read the function's docs to see what its return value is.

This lets us do lots of useful things. You can get mutable input streams; you can wrap them into filtering/transforming input streams that you then use instead; you can build hash maps efficiently; and you don't have to worry that code outside of the few lines in the current method will have done something you don't expect. Alternatively, you don't have to rely on some code far away "fixing up" your data structure at some point in the maybe-future, because they're not allowed to.

All together now

Here are some examples. First the basics: data is immutable and can be used without restriction — except, of course, that you can't modify it. References can be re-assigned as you please:

immutableFoo = [1, 2, 3] # immutable immutableFoo = [7, 8, 9] immutableFoo.append(7) # Error! # Can't modify immutable data. someMethod(immutableFoo) # okay anotherMethod(immutableFoo) # okay

Mutable data is only slightly more involved. Once you pass a mutable structure to a method or use it to create a lambda (or other data structure), you can't use it directly anymore.

mutableBar = mutable [4, 5, 6] for (i = 0; i < 5; ++i) { mutableBar.append(i) } someMethod(mutableBar) anotherMethod(mutableBar) # Error! # mutableBar was passed to a method # and is now invalid in this scope mutableBar = mutable [1, 2, 3] # Re-assign the var lamb = () -> mutableBar.append(4) someMethod(mutableBar) # Error! # mutableBar was used to construct a lambda # and is now invalid in this scope

Mutability is contagious. Any structure that contains mutable data is considered mutable itself, even if that mutability can't be directly accessed.

lamb = () -> mutableBar.append(4) lamb() yetAnotherMethod(lamb) lamb() # Error! # lamb was constructed with mutable data and is # thus mutable. It therefore became invalid when # we passed it to yetAnotherMethod

The solution is to return back a mutable structure. Here we pass a mutable array to niceMethod, at which point we can't use it; but we get back a different (as far as we know) mutable list, which we then reassign back to mutableBar.

mutableBar = mutable [1, 2, 3] mutableBar = niceMethod(mutableBar) mutableBar.append(4)

Tuesday, May 28, 2013

The problem with mutable types

Generally speaking, languages focus mostly on either mutable or immutable types. Of course, it's not quite as clear-cut as that n practice; the usual caveats about broad brushes apply. For instance, Haskell usually describes itself as having immutable types (at least in the intro-ish books), but its MVar is mutable; C and C++ have generally mutable types, but const can give you immutability.

I wasn't sure at first whether I'd want my new language to have mutable types. Even though I want my language to be largely functional (or at least inspired by functional programming), there are times when imperative, mutable-based thinking is more convenient -- or at least more natural to those of us who don't use functional languages every day. More to the point, I found myself wondering: what's really the problem with mutability?

The following Java for loop relies on mutability, but I don't think any programmer could really argue it's hard to follow or especially bug prone:

List<Long> times = new List<>();
for (int i = 0; i < 5; ++i) {
  times.add(System.nanoTime());
}
Put aside the fact that this code isn't very interesting; my point is that it's not very difficult. You can see exactly what it does and what effects it'll have on other code (namely, none). Contrast that to this almost identical example:
List<Long> times = this.getList();
for (int i = 0; i < 5; ++i) {
  times.add(System.nanoTime());
}
It's entirely unclear what this code will do. Will it throw an UnsupportedOperationException from add? Will it succeed, but in doing so break an invariant that another user of the list depends on? Will it modify a list that isn't threadsafe but had been (safely) published to another thread, in the expectation that it wouldn't be modified? Yikes. The problem is not mutability, it's mutable state that exists outside of the current scope. When we created the list in the first method, it was wicked easy to reason about changing it. It's when we got the list from somewhere else that things got complicated. If we had saved the reference to times in the first example to a field in the class, we'd again have to worry that somebody else will change it.

What it amounts to is that mutable state can pull the rug out from under you. I think I have an idea of how to address that without throwing the baby out with the bathwater; stay tuned for a followup post!

Sunday, May 26, 2013

Introducing: Project Effes

As I was just saying, the primary goal of this blog is to talk my way through my pipe dream of a new language. I'm calling this language Effes, for a couple of reasons. The language will be functional, and it'll be geared towards scripting rather than hard-core, general programming. F, S... "Effes." It's also the Hebrew word for zero, which is about how far I can reasonably expect my new language to go.

The world already has a bazillion programming languages, nearly all DOA. I expect mine to join that list; in fact, it'd be a major accomplishment if it were even significant enough for someone to add it to such a list. So then, why the hell?
  • It'll be fun.
  • I'll need to implement it in some other language (at least at first?), and that will give me an excuse to try out Haskell or Scala. Or both?
  • I think I have some neat ideas, and I want to try them out
  • There's some semblance of a need, in that I'm not really thrilled with any of the current options for scripting. At least, not the ones I know of.
Those points are ordered pretty much in descending order of validity. Working on a project just because it'll be fun, well, nobody can argue with that. Thinking I'm going to actually solve a slightly-annoying problem is pretty wishful. But hey, why not.

Besides which, anything I come up with can't be uglier or slower than Ruby, right? No, but seriously.

In the next few blog posts, I'll describe some of those neat ideas I have. We'll see if this blog gets any further than that. I'll also throw in a rant about Optional<T> in Java. How's that for a teaser?

Saturday, May 25, 2013

What, again?

Alright, this is take three now. After a failed attempt at a "personal ramblings" blog, and another failed attempt at building the next huffpo, I think it's time to once again litter the web with my crap and mixed metaphors.

What makes this blog different from the previous two? In truth, probably nothing; I'll be surprised if it makes it past a dozen posts. We'll see.

Well then, why put the blog up? In part because I just bought a domain and need something to put here, and in part because this time I have some idea what I want to write about. I've been tinkering with an idea for a new programming language — because if there's anything the world needs, it's another programming language — and I wanted a place to explore some of those ideas, as well as general thoughts about programming. I might throw in some politics too. I don't know, I don't know if we'll have enough time!

PS: One of the earlier posts in my "personal ramblings" blog was about a set I humbly named Shavitian Subprimes. It's a cool little set, in which not only is every number prime, but every substring of the number forms a number that's also prime. For instance, 373 is such a number because 373, 73 (from 373), 37 (from 373), 7 (from 373) and 3 (from 373 and 373) are all prime. A friend burst my bubble when he told me that this set has already been discovered and cataloged. Womp womp.