Yuval Shavit: object composition

Showing posts with label object composition. Show all posts

Monday, March 17, 2014

Inheritance is dead, long live composition

One aspect of the type system that’s always left me unsatisfied is its asymmetry against traditional object-oriented languages. Most OO languages formally recognize inheritance within the type system, but not composition. Given that Effes formally recognizes composition, shouldn’t it not recognize inheritance?

This is important to me for more than just aesthetic reasons. Recognizing both patterns makes for a more complicated type system, but worse, it gives the programmer a too-easy crutch. One of the reasons I turned to Haskell when I was interested in learning about functional programming was that I wanted to force myself to really start thinking in FP terms. If I were learning on a language like Scala, which combines OO and FP patterns, it’d be too easy to fall back on familiar ways of looking at a problem.

In the same way, I want Effes to force me into thinking with a composition-based perspective, rather than letting me have another inheritance-based language with a shot of composition flavoring.

The hurdle, though, has been polymorphism. It’s useful to have a method that takes Sizeable objects, whether they’re List, Map or anything else that’s Sizeable. It’s also nice to have that size method on both List and Map.

My solution is to replace “List is-a Sizeable” with “List has-a Size component:”

type List[A]:
    has Size
    add A -> List[A]
    -- etc...

For a user of List to get to the size method, they’ll need to access its Size component, which can be done explicitly with (list @ Size) size. But, if the Size component doesn’t conflict with any other of List’s components, you can implicitly access it: list size. And similarly, if a method takes a Size argument, you can explicitly give it the list’s Size component by calling theMethod (list @ Size), but you can also just call theMethod list, and the compiler will figure out that you want to pass it the Size component.

A nice side benefit of all this is that it provides a nicer answer to the question of conflicting components, which I addressed in earlier posts. Rather than handling conflicts at composition time by knocking out some components, I’ll allow the conflict there, and force the user into stating which component they want, when there’s a conflict. So for instance, if List and Set both have an add method, you can’t write listAndSet add foo. You have to explicitly call out the component you want: (listAndSet @ List[Foo]) add foo.

There are two syntax details I have yet to work out with this all-composition scheme.

The first involves cleaning up the code when a type has only one component: ConsList[A] “implements” List[A], for instance. Everything is fine from a useage perspective, but it’s a bit awkward to write out:

type ConsList[A]:
    has List[A]:
        -- all of the ConsList code goes here

So, I’m thinking of allowing a special “is-a” statement for this situation, which just lets you inline the second line in the above:

type ConsList[A] is List[A]:
    -- all of the ConsList code goes here

The second is in cleaning up implementations of nested types. Remember how List had a Size component above? Does that mean we have to implement it as:

type ConsList[A] is List[A]:
    add elem: ...
    Size:
        size: ...

or can we just write:

type ConsList[A] is List[A]:
    add elem: ...
    size: ...

My inclination here is to mirror the call site rules: you can inline the method definitions for a given component if that component doesn’t conflict with any other components. That keeps things simple, consistent and clean.

Sunday, January 12, 2014

A solution to the object-equality problem

As I mentioned in my last post, there are certain methods that require transitivity — and that means both objects must agree on a single implementation of those methods. For instance, if a Ball expects to compare itself against another Ball using only its radius, a WeightedBall cannot use radius and weight, even if it's being compared to another WeightedBall. If it did, transitivity could break when a plain, unweighted Ball joins the party.

My solution in Effes is to define compiletime methods, which look like normal instance methods but are in fact similar to Comparator objects whose instances are picked at compile time. This is basically an implementation of the strategy pattern.

To see this in action, consider the following snippet:

type Eq:
    [compiletime] == (other: This) -> Bool

data type Ball(radius: Int):
    is Eq
    [override compiletime] (== other):
        radius == other.radius

type WeightedBall:
    is Ball -- also inherits Eq
    weight -> Int:
    [override compiletime] (== other):
        radius == other.radius && weight == other.weight

    ByRadius:
        [override compiletime] (== other): (this: Ball) == other

First we define an Eq type, with a compiletime method named ==. This method takes one argument, which must be of the same type as the declaring class (the This class is a reserved class name, just as this is a reserved variable name in many OO languages, including Effes). The This restriction is re-applied in each specific type, so Ball == must take a Ball argument (not just a Eq).

The WeightedBall subtype of Ball redefines == to include the two balls' weights. But it also defines a "variant" called ByRadius, which only compares radii. It does this by casting itself down to a plain Ball, and then calling ==, such that this call to == has a most specific implementation of (Ball ==).

To someone trying to compare two objects, the == method looks like a normal method call. But under the hood, Effes delegates it to a strategy object. Here's a simple set implementation which maintains a list of elements and just checks each new element as it comes in, to make sure it's not already in the list:

type ListSet[T:Eq]:
  val values : List[T] = Empty
  add elem:T -> ListSet[T]: if values contains T
    then this
    else values' = values push elem
  contains (needle : T): case of:
    Nothing: False
    (head, tail): case head of:
      needle: True
      _: tail contains elem

The == magic happens in the last three lines. The case head of statement tries to match head against needle, using ==. If they match, the answer is True; otherwise, we try recursively on the tail list.

As you can see, contains takes one argument, the needle we're looking for. But under the hood, it actually takes two arguments: the needle, and an "equality strategy" that holds some implementation of ==. This object is picked at compile time, ensuring that runtime polymorphism doesn't screw up transitivity.

For instance, let's try to add a heavyBall : WeightedBall to two ListSets, one typed to Ball and the other to WeightedBall:

plainSet : ListSet[Ball] = getSomeListSet
weightedSet : ListSet[WeightedBall] = getSomeListSet
s1 = plainSet add heavyBall
s2 = weightedSet add heavyBall

Note that both variables are initially assigned to identical sets, meaning that both have all WeightedBall elements. But, plainSet is typed to ListSet[Ball]. The last two lines are internally translated to something like:

s1_ = plainSet add heavyBall (Ball ==)
s2_ = weightedSet add heavyBall (WeightedBall ==)

At the == call site (in ListSet contains), the case head of ... needle snippet is translated to something like if (_equalityTester head needle), where _equalityTester is the strategy object.

If we wanted the set to contain WeightedBalls, but only restrict to unique radii, we would write:

uniqueRadii : ListSet[WeightedBall:ByRadius] = getSomeListSet
s3 = uniqueRadii add heavyBall

The WeightedBall:ByRadius bit tells the compiler to pick the ByRadius variant we defined above:

s3_ = uniqueRadii add heavyBall (WeightedBall:ByRadius ==)

There's one important bit to note here. Comparing two objects by radius is going to give you a different result than comparing them by radius and weight; there could be more duplicates. So when we cast the result of getSomeListSet to ListSet[Ball], that's not the relatively light operation that a cast usually is. We have to actually rebuild the ListSet so that it knocks out those extra duplicates. I'm still working on the mechanism for this; maybe the casting is only allowed if the type defines a method for it, something like:

type ListSet[T:Eq]:
    @cast ListSet[T2]: ListSet[T2]() addAll values

The signatures of these compiletime methods are a bit deceitful. They look like instance methods, and even define this but they're actually just normal two-arg methods. This is of course the same for all instance methods (this is always just an implicit argument), but it feels like more of a lie here. I think it's because the this argument isn't even semantically special; the use case for compiletime methods is precisely that this and other are on the same footing!

The reason I went for the cheat is simply that it feels more natural to me, even if it's a bit inaccurate. It lets us write things like a == b (instead of == a b, as would be the case for a non-instance method) without tricky rules for declaring methods as inline. My goal from the beginning was to try to get the best of both worlds that Java offers: the correctness of Comparators with the natural feel of this.equals. I think this approach succeeds in that goal, at the cost of pretty minimal sugar/lies.

Wednesday, December 4, 2013

An argument for using both composition approaches

A recap of where we are with object composition: In my last few posts about object composition, I initially thought that simple composition methods don't work because of mismatches between compile-time and run-time information.

Instead, I thought I'd have to go with a model analogous to Schrodinger's cat, where a composed object's methods can be bound to multiple implementations at the same time. Each of those implementations is run, and the results are themselves composed into a single object for the result value. Like a cat that's both alive and dead, the method is bound to both implementation A and implementation B — and its result is both A' and B'.

There's a certain beauty to that, but it's pretty complicated. I also suspected it may result in slow code as objects get more and more complicated at runtime. So I thought some more and came up with a solution that leads to a simple, non-Schrodinger object.

But maybe these concerns are premature on my part. I'm basically trying to do two optimizations — for simplicity, and for performance — without having measured how bad they are.

On the other other hand, back when I first proposed the Schrodinger approach, I noted that eventually you'll need to collapse the wave function, so to speak. At the end of the day, you need to print "I'm a tea pot" or 3.14159; you can't print their superpositioned composition.

So then, maybe the solution is to use both approaches. The Schrodinger approach will be the standard composition mechanism (not in any formal way, but as a coding recommendation), while the "simple" approach will be used to collapse the wave.

To anchor things, let's call the standard composition operator <>, and the simple one </ (you'll see why in a bit).

I'll also define a reserved type Simple which can only be composed into other objects using the simple composition operator. So, if a method takes Simple Foo Bar and you have a Foo Bar (which may be superpositioned), you need to compose it using the simple operator: Simple </ myFooBar.

This handles the print case, for instance. stdout.print will take a Simple String, and it's up to the caller to ensure that its variable is already Simple, or to collapse it otherwise.

One of the lingering questions I had in the back of my mind was what to do when two objects of the same type are composed: SomeInt(1) <> SomeInt(2). Now the answer seems obvious. With the Schrodinger composition operator, just superposition them both in. With the simple operator, the left one will get folded in first, and the right one will then collide and be discarded.

That's why I picked </ as the operator. It looks like it points to the left, and thus conveys that the left operand has precedence in case of collisions.

There's not really a compelling argument for this complexity. Why don't I just stick with the simple composition approach? Basically because I think the Schrodinger variant seems fun and worth playing around with.

Monday, December 2, 2013

Saving object composition from complexity

In my last post about object composition, I concluded that simple composition approaches don't work because of the mismatch between compile-time and run-time information. But I realized a few days later that I can define a simple composition.

The trick is to cheat: come up with the answer I want first, and then work backwards to fill in the rest!

What I do is to add a new step at the beginning of composition. In this step, we start out with a "composition target." We then fold the composition operands' components in two phases: first the ones that are part of the target, and then the rest.

What's the target type? At compile time, it's empty; at run time, it's the compile-time type.

As always, I'll illustrate with an example. In fact, I'll use the same example as before. Here's the background stuff:

type List[A] =
    ... (put/get)
    size -> Int
type Box[A] = ... (put/get)

type Container:
  isEmpty -> Bool -- abstract method

type Container Box[A]:
  @override isEmpty -> Bool = TODO

intsList : List[Int] = list(1, 2, 3)
intsBox : Box[Int] = box(4)

tmp = intsBox <?> Container
intsContainer : Container = tmp

... and here's the composition:

composed = intsContainer <?> intsList
composedSize = composed2.size

The problem we had before was that intsContainer is only a Container at compile-time, but it's a Box[Int] <?> Container at runtime. When Container is composed with List[Int] at compile time, no conflicts are found, so the resulting type is Container <?> List[Int]. But at run time, Box[Int] and List[Int] collide and cancel each other out, and the resulting object is only a Container, which doesn't have a size method.

The composition target saves the day. Everything works the same at compile time, leading to a type of Container <?> List[Int]. At run time, we take the following steps:

Decompose both operands, so that the composition is (Box[Int] <?> Container) <?> List[Int].
The target type is Container <?> List[Int], so take those components out of the composition and fold them into the resulting object. We now have:
- A folded object of Container <?> List[Int]
- Remaining object of (Box[Int] <?> ∅) <?> ∅, which simplifies to just Box[Int].
Fold the rest of the components (just Box[Int]). Box[Int] collides with List[Int], so discard it — and there's nothing else to fold.

So the resulting object is a Container <?> List[Int], which is exactly what we expected at compile time!

I wrote above that the target type at compile time is empty, but even that can be improved upon: it's the closest available explicit type declaration, if any. So, if you had:

boxContainer = intsBox <?> Container
composed2 = boxContainer <?> intsList

... then it would result in a compile-time error, since intsBox : Box[Int] collides with intsList : Box[Int] (they both define put and get methods). But if you had:

composed3 : List[Int] = boxContainer <?> intsList

... then the target type is List[Int], meaning that this gets folded in first. When the conflict with Box[Int] is detected in the second folding stage, Box[Int] is discarded. The resulting type of the composition (this is all at compile time, remember) is List[Int] <?> Container, which is then "upcast" to List[Int] when it's assigned to composed3.

Tuesday, November 19, 2013

I'm going to have to maybe-kill the cat

In my last post, I discussed problems with the runtime binding of composed objects: if object a has a method foo, and object b also has a method foo, then how does the composed object (a <?> b) behave? In this post, I'd like to explore a separate but related pair of problems: what's the type of a <?> b, both at run-time and compile-time?

(Before you get too far into this post, a disclaimer: this post describes a method of composition and then explains why it doesn't work. If that strikes you as meandering and pointless — if you prefer to read about ideas that might work, rather than those that definitely won't — then you might want to skip this post.)

As before, let me set this up with an example. As an added bonus, you'll get to see the latest revision of Effes' syntax, which is pretty close to complete (now I really mean it!). Since I haven't laid out how composition works, I'll use <?> as a placeholder for "some variant of composition." With that said:

type List[A] = data Node(head:A, tail:List[A]) | Nothing
  put e:A -> List[A] = TODO
  get -> (Maybe[A], List[A]) = TODO

type Box[A] = Maybe[A]:
  put e:A -> Box[A] = TODO
  get -> Maybe[A] = TODO

myList : List[Int] = TODO
myBox : Box[Int] = TODO
myContainer = myList <?> myBox

Here we have two simple types: one representing a linked list, and one representing a box that can contain zero or one items. Their details aren't important. We also have three references: one of type List[A], one of Box[A], and one representing the composition of the first two.

As I discussed in the last post, we have some options with a call like myContainer.put 1. We can take the Schrodinger approach, in which we invoke put on both components of myContainer and then compose those two results; or we can pick one as the "winning" binding and only invoke it.

Let's say we pick the single-bound, "winner" approach; that's the option I was leaning towards in the last post. To make it concrete, let's say that the left-hand side always wins out. That seems reasonable, but what about this situation:

myContainer2 = myContainer.put 123
myVal : Maybe[Int]
myVal = case myContainer2 of
  Box[Int] -> myContainer.get
  _ -> Nothing

In this snippet, we check to see if myContainer is a Box[Int]. It is, so the first pattern will match. This casts myContainer2 to Box[Int] and, within that new context, invokes get. Note that get here has to be bound to the Box version; the List[A] version has a different return type (it returns a pair containing the list's possible head and its tail).

The problem is that myContainer.put was bound to the List[A] method, meaning that the Box component of myContainer never had 123 inserted into it: the value of myVal is Nothing, not 123! This is fully consistent, but it's confusing and violates the principle of least surprise. There has to be a better way!

One possibility is to limit composition such that myContainer2 does not have a runtime type of Box[Int]: the second (catch-all) pattern matches, and the value of myVal is Nothing.

Of course, we always want the runtime type to be a subtype of the compile-time type, so this new approach means that the composed object's compile-time type can't include Box[Int]. We can't generalize that to all compositions, or else the whole idea of composition falls apart and becomes shorthand for the not-very-useful function a <?> b = a.

We can come up with a more precise limit on composition, one that doesn't just throw away the RHS altogether. One approach to compose objects in multiple phases. Here's one such algorithm, in rough terms:

Decompose the RHS object to its constituent objects.
Fold each one of those into the LHS, one at a time. As you fold each object, check to see whether it has any methods that conflict with methods on the composed object; if so, ignore that object (don't fold it in).
Check whether the resulting object has any abstract methods (that weren't given a concrete definition in the composed object). If so, remove the objects that introduced those methods.

These three phases happen separately for both the compile-time type and the runtime type. Here's an example, starting with some types and some objects:

type List[A] =
    ... (put/get, as above)
    size -> Int
type Box[A] = ... (put/get, as above)

type Container:
  isEmpty -> Bool -- abstract method

type Container Box[A]:
  @override isEmpty -> Bool = TODO

intsList : List[Int] = list(1, 2, 3)
intsBox : Box[Int] = box(4)

intsContainer1 = intsBox <?> Container

The List[A] and Box[A] types are as before, except that List[A] now also has a size method.

Let's look at that last line. The LHS is Box[Int] with methods put and get, while the RHS is Container with just one method, isEmpty. There's no overlap in methods, so the composition is straightforward and results in Box[Int] <?> Container for both the compile-time and runtime type.

Okay, so that case works fine. But what about this?

intsContainer2 : Container = intsContainer1
composed2 = intsContainer2 <?> intsList
composedSize = composed2.size

At compile time, that last line looks like Container <?> List[Int], which has no conflicts and thus doesn't remove any types. But the Container component's isEmpty method isn't implemented, so it's removed: the resulting type is List[Int]. We call its size method, which is a pretty reasonable operation to call on a list.

At runtime, the composition looks like (Box[Int] <?> Container) <?> List[Int], and there is a conflict: List[Int]'s methods collide with Box[Int], and we therefore don't fold List[Int] into the composition. That means that the resulting type is just Box[Int] <?> Container — the inverse of the compile-time type! And in particular, there's no size method on that object. Boom.

I've tried a few variants on this sort of decompose-and-recompose theme: decomposing both sides, different handling of abstract methods, etc. Inevitably, the mismatch between runtime and compile-time types always blows up in my face. I don't think there's a way around it.

Unfortunately, I think this leads me to the conclusion that if I'm going to do composition, I have to do composition all the way: Schrodinger-style. This means I'll need to figure out various issues about collapsing the wave form: how does pattern-matching work, and what happens when an object is composed with an object of the same type? For instance, how does "foo" <?> "bar" work?

Friday, October 11, 2013

Of object composition and a maybe-dead cat

There's a problem with Effes that could prove deadly, at least to object composition: composed objects can conflict in ways that can't be caught at compile time, and there's no satisfying, simple way to resolve the conflicts. It's a problem I realized a few weeks ago while thinking about various use cases for composition.

To warm up, consider these types:

Animal
Dog is Animal
Happy
(Happy Dog) is Animal

Dog and Happy are simple types, and (Happy Dog) is their conjunction. Animal is an "interface-y" type with implementations provided by Dog and (Happy Dog). Let's say balto has a compile-time and runtime type of Happy Dog, and that Animal provides a method speak. Does balto speak invoke the implementation for Dog or Happy Dog?

That's not so tricky (I said it was a warmup!). Happy Dog is intuitively more specific than Dog, so we should invoke its speak. Let's try something just a bit trickier:

type Animal:
  speak -> String
data Dog is Animal:
  speak = "Woof!"
data Cat is Animal
  speak = "Meow!"

balto = Dog
felix = Cat
dogCat = balto @ felix
spoken = dogCat speak

This is trickier. The object dogCat is both a Dog and a Cat, both of which provide equally-specific overrides of Animal::speak. Which gets invoked? If your instinct is to just reject that at compile-time, consider this slightly more dynamic variant:

type Animal:
  speak -> String
data Talking, Dog, Cat -- no inherent relationships to Animal
type Talking Dog is Animal:
  speak = "Woof!"
type Talking Cat is Animal:
  speak = "Meow!"

balto = Talking Dog
felix : Cat = Talking Cat
dogCat = balto @ felix
spoken = dogCat speak

Here, felix has a run-time type of Cat but a compile-time type of Talking Cat, which is Animal. This means we can't detect a conflict at compile time, but dogCat still has two equally-specific implementations of speak at runtime.

We can come up with something clever based on compile-time contortions: take only the left object's implementation when they're composed, for instance, or take only the one whose compile-time type provides an implementation and fail to compile if they both provide an implementation. These would all work for the examples so far, though they're not very inspiring.

The problem is that Effes, like other functional languages, lets you do pattern matching — that is, runtime type checks. This is the killer:

type Animal:
  speak -> String
data Talking, Dog, Cat
type Talking Dog is Animal...
type Talking Cat is Animal...

balto : Dog = Talking Dog
felix : Cat = Talking Cat
dogCat = balto @ felix

spoken = case dogCat of
  Animal: dogCat speak
  else: "whatever"

balto and felix never have a compile-time type of Animal, and neither does dogCat (its compile-time type is just (Cat Dog). Yet both of them have a runtime type of Animal, and we intuit that dogCat should, too. The question, again, is what that speak method does.

There aren't many good, simple solutions to this that I can think of. There are bad, simple solutions; and there are interesting, complex ones. These are the ones I came up with:

A runtime failure at balto @ felix, since it would create a possibly ambiguous object.
A runtime failure when dogCat is matched against Animal and then invokes speak, since this is the aforementioned ambiguity.
Pick one of the implementations by some arbitrary rule, like the left-hand side of the conjunction.
Run both methods and return the conjunction of their results.

I really want to avoid runtime failures here: it's too easy to stumble into this case. The last option is interesting in a computer sciencey way, but it creates a possibly complex world. Let's call this the Shroedinger approach, since it creates an object that's like a superpositioning of barks and meows. What if we then run that object through a pattern match?

who = case spoken of
  "Woof!": "It was a dog"
  "Meow!": "It was a cat"
  else: "What was it?"

If we took the Shroedinger approach for speak, then spoken is ("Woof!" @ "Meow!"); it can match against either of the first two cases. Pattern matching traditionally executes just the first branch that matches, but this approach is inconsistent with the Shroedinger approach that got us this spoken in the first place. So, let's stick with the Shroedinger philosophy of taking all of the cases and conjoining them: the resulting object is a conjunction ("It was a dog" @ "It was a cat"). Of course, the expression "It was a dog" could instead be some method, one which itself might return a "superpositioned" object; and the same could be true of "It was a cat". By the time we finish the case statement, who might be pretty darn complex!

In short, we have three options for handling ambiguity: runtime failure, subtle behavior that depends on composition being non-commutative, or subtle behavior in which each ambiguous decision split the program into a more ambiguous runtime state. The last of those options is the most novel and interesting, but eventually we have to collapse the wave-form: most of the time, we need to end up printing either "the dog says woof" or "the cat says meow."

One option is to provide both philosophies: two parallel case statements, one of which executes all branches and the other of which executes the first it finds. A similar option is to provide a sort of decomposition; given an object with a compile-time type Foo, return an object which is only a Foo, with all other composed objects removed.

Even if we take this approach, all this superpositioning makes the code a lot less strictly typed. The type system provides a very low bound on each object's guarantees! Traditional type systems generally make guarantees like, "it'll be a list, but we don't know what kind of list." Shroedinger-Effes will be able to say "it's an empty list, but we don't know if it's also a non-empty list."

It also raises the question of whether methods are overridden, or if they're always superimposed. Remember our first Balto, the Happy Dog? We intuited that its speak should override the "less specific" implementation provided by Dog. Does that still happen, or do we Shroedinger it up and invoke both?

There is one simpler option. I wrote above that because balto and felix each have a runtime type (but not compile-time type) of Animal we'd want their composed object to also have a runtime type of Animal. What if this weren't the case? That is, that runtime behavior could only ever override types and methods that are known at compile time, but not to perform arbitrary runtime checks such as whether dogCat is an Animal? That may be the way to go; I'll think on it for a bit.

Friday, September 6, 2013

(Finally!) A use case for composed objects

I've been playing with Effes syntax off and on for the last couple weeks, and I happened across a use case for conjunctive types. I'll admit: even though I thought the idea was neat when I came up with it, I never knew what I'd use it for — it just seemed like it should be useful for something. So it was nice that a use case came up without me actively looking it.

Having mostly settled on the smaller syntax stuff like function declarations, I decided to start putting it together by writing a JSON parser. The JSON type itself is a disjunctive type, which is very intuitive:

type Json = Null | String | List[Json] | ...

But as it turns out, conjunctive types — informally, "TypeA = TypeB and TypeC" — are useful as the parsing function's result type.

Parsers often break the principle of single responsibility for the sake of usability: they're responsible for parsing text and providing context in the event of failure. It's not a major problem, but it does pose some design questions. For instance, how much context should the parser keep? The current line number? Column number? Previously seen text? How much previous text? Note that it doesn't need any of this for its main task, which is parsing.

In my parser, I solved this by having the parser return either a Json object or a failure composed with the remaining string: Failure("expected a comma") @ string (I've picked @ as my composing token.) The full result type is Json | (Failure @ String).

That doesn't sound exciting at first, but remember that a string is just a list of characters, and a character is itself an object — meaning it can itself be composed! In other words, what the parser thinks of as List[Character] might actually be List[Character @ MyContextFoo]. That context component could contain the current line and column position, the previous few chars of context, etc.

That means that the parser is agnostic of all that contextual information, which after all it doesn't need. It keeps chipping away at the string, oblivious of the fact that the characters it sees are in fact rich objects instead of plain Chars; in fact, the string is oblivious of that, itself.

At the application level, parsing is now a three-step process: get the input string, transform it into a string of "rich" characters (basically by mapping each character and then using those mapped characters to create a new string), and parsing that enriched string.

One problem I haven't resolved yet is how to statically type all this. It's not enough for the programmer to know that the string is enriched; how can they convey this to the type system in such a way that the information gets through to the result type? This is important, because otherwise the language devolves to a static-dynamic hybrid. That's a slippery slope that will probably lead to mostly dynamically-typed programs.

I don't want the parser to have to know that it's working with a subtype of String if that's only relevant to the result type. If nothing else, that puts the onus on the API author to always use general (and presumably more verbose) declarations; the strength of this idea is that it puts the power in the call site's author's hands. On the other hand, I do want the input string's "subtype" to bubble back up, so that the return type is promoted to something like Json | (Failure @ MyCoolString). I have some ideas, but the devil is in the details...