Thursday, November 21, 2013

Half-thread-safe collections in Java

I thought about an "interesting" (ie, I'm probably the only one who finds it interesting) question yesterday. Is it possible to define a thread-safe collection in Java that doesn't transfer objects across threads safely?

I know, that sounds weird; let me explain. The thread-safe collections in the JDK all have documentation to the effect of "actions in a thread prior to placing an object into a BlockingQueue happen-before actions subsequent to the access or removal of that element from the BlockingQueue in another thread."

My question is: is it possible to write a collection such that the collection itself is thread-safe, but it doesn't provide that sort of guarantee? I'm using "collection" colloquially as "some class that provides put-like and get-like operations."

That is, given:

class Foo {
    int val;
}

// Thread A:
Foo fooIn = new Foo();
fooIn.val = 5;
weirdCollection.add(fooIn);

// Thread B:
Foo fooOut = weirdCollection.get();
assert fooOut.val == 5;

... is it possible to create a weirdCollection such that it is apparently thread-safe, yet the assertion would fail?

You might ask what I mean by "apparently thread-safe." The easiest definition might include that its implementation contains no data races, but that's too strong. After all, java.lang.String contains an intentional data race (in hashCode()), but it's considered thread-safe.

Let me define "apparently thread-safe" to mean that the collection itself appears to work in a thread-safe manner from the outside. For instance, if the get method is supposed to block until an element is put in, then you should expect that (a) it will return reasonably soon after an element is put into the collection (i.e., almost instantly) and (b) the result will never be null (let's assume the put method doesn't accept nulls).

Here's what I came up with:

public class Holder<E> {
  private E unsafe;
  private volatile E safe;

  public void put(E element) {
    if (element == null) throw new NullPointerException();
    this.unsafe = element;
    this.safe = element;
  }

  public E get() {
    E result = unsafe;
    // We weren't lucky enough to get the unsafe reference
    while (result == null) {
      result = safe;
    }
    return result;
  }
}

So, my first step is to demonstrate that it's apparently thread-safe (as defined above). To do that, I have to demonstrate that get() will return a non-null reference soon after an element.

The get() method starts by grabbing the value of unsafe and assign it to result. At this point, result may or may not be null. If it is, the code reads safe in a busy-wait loop until it's not. The nature of volatile fields tells us that the JVM will update this state reasonably soon after the write to safe in put. Once result is null, we never assign it to anything else. Actions within a thread always appear sequential, and since result is thread-local, we're guaranteed that it won't flip back to a non-null state.

(Side note: the JVM isn't actually obligated by the JLS to update the state any time soon! It could decide to park the thread for a year, for instance, or order the read to safe after a year's worth of actions in other threads. But in practice, people expect volatile fields to get published essentially instantly, and for them to not be would be a serious performance bug in the JVM. It's similar to how the spec doesn't say how quickly i++ has to happen, but there'd be hell to pay if it weren't nearly instantaneous.)

Okay, so get() returns a non-null value almost-instantly after put. Now I need to demonstrate that this publication is unsafe. This is actually pretty easy! Imagine that result = unsafe assigns a non-null value in that first line of get. If that happens, the body of the while loop will never be executed, meaning that thread B will never read the volatile variable safe. This means there is no happens-before relationship between any of thread B's actions and any of thread A's!

Essentially, thread B got the reference that A unsafely published, but it did so purely out of the kindness of the compiler and CPU's cache. That kindness doesn't come with any other guarantees, such as that the memory referenced by that object has also been flushed. And so, the compiler/cache/whatever is perfectly free to reorder the assignment foo.val = 5 to happen after thread B reads foo.val, which would then have its default value of 0.

I wonder how far this could go. I suspect that it can't go very far: even something as simple as a singly-linked list would get into trouble, as actions like removing a node would be fraught in data races that would be hard or impossible to detect and compensate for in the code (as we did by resorting to the spin on result = safe above).

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.