Friday, September 6, 2013

(Finally!) A use case for composed objects

I've been playing with Effes syntax off and on for the last couple weeks, and I happened across a use case for conjunctive types. I'll admit: even though I thought the idea was neat when I came up with it, I never knew what I'd use it for — it just seemed like it should be useful for something. So it was nice that a use case came up without me actively looking it.

Having mostly settled on the smaller syntax stuff like function declarations, I decided to start putting it together by writing a JSON parser. The JSON type itself is a disjunctive type, which is very intuitive:

type Json = Null | String | List[Json] | ...

But as it turns out, conjunctive types — informally, "TypeA = TypeB and TypeC" — are useful as the parsing function's result type.

Parsers often break the principle of single responsibility for the sake of usability: they're responsible for parsing text and providing context in the event of failure. It's not a major problem, but it does pose some design questions. For instance, how much context should the parser keep? The current line number? Column number? Previously seen text? How much previous text? Note that it doesn't need any of this for its main task, which is parsing.

In my parser, I solved this by having the parser return either a Json object or a failure composed with the remaining string: Failure("expected a comma") @ string (I've picked @ as my composing token.) The full result type is Json | (Failure @ String).

That doesn't sound exciting at first, but remember that a string is just a list of characters, and a character is itself an object — meaning it can itself be composed! In other words, what the parser thinks of as List[Character] might actually be List[Character @ MyContextFoo]. That context component could contain the current line and column position, the previous few chars of context, etc.

That means that the parser is agnostic of all that contextual information, which after all it doesn't need. It keeps chipping away at the string, oblivious of the fact that the characters it sees are in fact rich objects instead of plain Chars; in fact, the string is oblivious of that, itself.

At the application level, parsing is now a three-step process: get the input string, transform it into a string of "rich" characters (basically by mapping each character and then using those mapped characters to create a new string), and parsing that enriched string.

One problem I haven't resolved yet is how to statically type all this. It's not enough for the programmer to know that the string is enriched; how can they convey this to the type system in such a way that the information gets through to the result type? This is important, because otherwise the language devolves to a static-dynamic hybrid. That's a slippery slope that will probably lead to mostly dynamically-typed programs.

I don't want the parser to have to know that it's working with a subtype of String if that's only relevant to the result type. If nothing else, that puts the onus on the API author to always use general (and presumably more verbose) declarations; the strength of this idea is that it puts the power in the call site's author's hands. On the other hand, I do want the input string's "subtype" to bubble back up, so that the return type is promoted to something like Json | (Failure @ MyCoolString). I have some ideas, but the devil is in the details...

No comments:

Post a Comment