Tuesday, September 10, 2013

Maybe it's optional?

A lot of functional and functional-inspired languages don't have the concept of null. Instead, they have types called Maybe or Optional — basically a box of 0 or 1 items. Effes is going to take that approach, but I might put a twist on it.

In a nutshell, the idea behind Maybe (I'll settle on Haskell's terminiology) is that there's a Nothing that represents the absense of something, a Just Foo that represents one Foo, and a Maybe Foo type which can be either a Nothing or a Just Foo.

Like other functional languages, Haskell has syntax (called pattern matching) that's kinda-sorta like an instanceof check plus a downcast in Java. Putting it all together looks something like this:

sayHi :: (Show e) => e -> String
sayHi Nothing = "Nothing to say hi to!"
sayHi (Just e) = "Hello, " ++ (show e) ++ "!"

(The (Show e) => syntax just means that e has a show method, which is like Java's toString.) In Effes, a direct translation would be a disjunctive type:

data Nothing
data Just[A] = elem : A
type Maybe[A] = Nothing | Just[A]

sayHi (m:Maybe[A Stringable]) -> String:
  case m of
      Nothing: "Nothing to say hi to!"
      Just e: "Hello, {{e}}!"

Because Effes has a more flexible type system, we can actually get away without the Just part of the Maybe pair. Instead, it looks something like this:

data Nothing
type Maybe[A] = Nothing | A

sayHi (m:Maybe[A Stringable]) -> String:
  case m of
      Nothing: "Nothing to say hi to!"
      e: "Hello, {{e}}!"

There's not a really strong driving force for this, except that it seems a bit cleaner. Instead of a Maybe being "either nothing or a box of one something," it's "either nothing or one thing." Plus it takes advantage of my cool new type system, so that's nice too.

The problem is when the A type is itself a Maybe: Maybe[Maybe[A]]. If we see that it contains a Nothing, does that mean we didn't have anything, or that we had one Nothing? To prevent unreachable code, I'd probably want the type checker to reject this altogether: Maybe[Maybe[String]] would be a type error.

That's not terrible, I guess, but the erroring type could be nestled in some data structure. For instance, if a linked list uses Maybe to signify an end, then LinkedList[Maybe[String]] wouldn't compile — and probably with some unintuitive or frustratingly un-actionable error message.

On balance, I'm leaning towards keeping the Just type. It doesn't add much complexity to the Maybe type, pattern matching keeps the call sites simple, and it eliminates ambiguity.

Friday, September 6, 2013

(Finally!) A use case for composed objects

I've been playing with Effes syntax off and on for the last couple weeks, and I happened across a use case for conjunctive types. I'll admit: even though I thought the idea was neat when I came up with it, I never knew what I'd use it for — it just seemed like it should be useful for something. So it was nice that a use case came up without me actively looking it.

Having mostly settled on the smaller syntax stuff like function declarations, I decided to start putting it together by writing a JSON parser. The JSON type itself is a disjunctive type, which is very intuitive:

type Json = Null | String | List[Json] | ...

But as it turns out, conjunctive types — informally, "TypeA = TypeB and TypeC" — are useful as the parsing function's result type.

Parsers often break the principle of single responsibility for the sake of usability: they're responsible for parsing text and providing context in the event of failure. It's not a major problem, but it does pose some design questions. For instance, how much context should the parser keep? The current line number? Column number? Previously seen text? How much previous text? Note that it doesn't need any of this for its main task, which is parsing.

In my parser, I solved this by having the parser return either a Json object or a failure composed with the remaining string: Failure("expected a comma") @ string (I've picked @ as my composing token.) The full result type is Json | (Failure @ String).

That doesn't sound exciting at first, but remember that a string is just a list of characters, and a character is itself an object — meaning it can itself be composed! In other words, what the parser thinks of as List[Character] might actually be List[Character @ MyContextFoo]. That context component could contain the current line and column position, the previous few chars of context, etc.

That means that the parser is agnostic of all that contextual information, which after all it doesn't need. It keeps chipping away at the string, oblivious of the fact that the characters it sees are in fact rich objects instead of plain Chars; in fact, the string is oblivious of that, itself.

At the application level, parsing is now a three-step process: get the input string, transform it into a string of "rich" characters (basically by mapping each character and then using those mapped characters to create a new string), and parsing that enriched string.

One problem I haven't resolved yet is how to statically type all this. It's not enough for the programmer to know that the string is enriched; how can they convey this to the type system in such a way that the information gets through to the result type? This is important, because otherwise the language devolves to a static-dynamic hybrid. That's a slippery slope that will probably lead to mostly dynamically-typed programs.

I don't want the parser to have to know that it's working with a subtype of String if that's only relevant to the result type. If nothing else, that puts the onus on the API author to always use general (and presumably more verbose) declarations; the strength of this idea is that it puts the power in the call site's author's hands. On the other hand, I do want the input string's "subtype" to bubble back up, so that the return type is promoted to something like Json | (Failure @ MyCoolString). I have some ideas, but the devil is in the details...