Java 8 LongAdders: The Right Way To Manage Concurrent Counters

By Tal Weiss —  April 16, 2014 — 5 Comments

Blog-sheeps

I just lOvE new toys, and Java 8 has a bunch of them. This time around I want to talk about one of my favourites – concurrent adders. This is a new set of classes for managing counters written and read by multiple threads. The new API promises significant performance gains, while still keeping things simple and straightforward.

As people have been managing concurrent counters since the dawn of multi-core architectures, let’s take a look and see what are some of the options Java offered up until now, and how they perform compared to this new API.

Dirty counters – this approach means you’re writing / reading from a regular object or static field across multiple threads. Unfortunately, this doesn’t work for two reasons. The first is that in Java, an A += B operation isn’t Atomic. If you open up the output bytecode, you’ll see at least four instructions – one for loading the field value from the heap into the thread stack, a second for loading the delta, a third to add them and the fourth to set the result into the field.

If more than one thread is doing this at the same time for the same memory location, you run a high chance of missing out on a write operation, as one thread can override the value of another (AKA “read-modify-write”). There’s also another nasty angle to this which has to do with the volatility of the value. More on that below.

This is such a rookie mistake, and one that’s super hard to debug. If you do run across anybody doing this in your app, I’d like to ask a small favor. Run a search across your database for “Tal Weiss”. If you see me there – delete my records. I’ll feel safer.

Synchronized – the most basic of concurrency idioms, this blocks all other threads while reading or writing the value. While it works, it’s a sure-fire way of turning your code into a DMV line.

RWLock – this slightly more sophisticated version of the basic Java lock enables you to discern between threads that change the value and need to block others vs. ones that only read and don’t require a critical section. While this can be more efficient (assuming the number of writers is low) it’s a pretty meh approach, as you’re blocking the execution of all other threads when acquiring the write lock. It’s really only a good approach when you know the number of writers is materially limited compared to readers.

Volatile – this fairly misunderstood keyword essentially instructs the JIT compiler to de-optimize the run-time machine code, so that any modification to the field is immediately seen by other threads.

This invalidates some of the JIT compiler favorite optimizations of playing with the order in which assignments are applied to memory. Come again you say? You heard me. The JIT compiler may change the order in which assignments to fields are made. This arcane little strategy (also known as happens-before) allows it to minimize the number of times the program needs to access global heap, while still making sure your code is unaffected by it. Pretty sneaky…

So when should I use volatile counters? If you have just one thread updating a value and multiple threads consuming it, this is a really good strategy – no contention at all.

So why not use it always you ask? Because this doesn’t work well when more than one thread is updating the field. Since A += B is not atomic, you’re running a risk of overriding somebody else’s write. Up until Java 8, what you needed to do for this was use an AtomicInteger.

AtomicInteger – this set of classes uses CAS (compare-and-swap) processor instructions to update the value of the counter. Sounds great, doesn’t it? Well, yes and no. This works well as it utilizes a direct machine code instruction to set the value with minimum effect on the execution of other threads. The downside is that if it fails to set the value due to a race with another thread, it has to try again. Under high contention this can turn into a spin lock, where the thread has to continuously try and set the value in an infinite loop, until it succeeds. This isn’t quite what we were looking for. Enter Java 8 with LongAdders.

Java 8 Adders – this is such a cool new API I just can’t stop gushing about it ♥♥♥. From a usage perspective it’s very similar to an AtomicInteger. Simply create a LongAdder and use intValue() and add() to get / set the value. The magic happens behind the scenes.

What this class does is when a straight CAS fails due to contention, it stores the delta in an internal cell object allocated for that thread. It then adds the value of pending cells to the sum when intValue() is called. This reduces the need to go back and CAS or block other threads. Pretty smart stuff!

So alright, enough talking – let’s see this puppy in action. We’ve set up the following benchmark -to increment a counter across multiple threads up to 10^8. We ran the benchmark with a total of ten threads – five for writing and five for reading. The target machine has an i7 processor with 4 cores, so we’re bound to have some serious contention here:

table_april 16_02 (2)

** The code is available here

Notice that both dirty and volatile risk some serious value overwrites.

The Bottom Line

  • Concurrent Adders clean house with a 60-100% performance boost over atomic integers.
  • Adding threads didn’t make much of a difference, except when locking.
  • Notice the huge performance penalty you get for using synchronized or RW-locks – one order or even two orders of magnitude slower!

 

If you’ve already had the chance to use these classes in your code – I’d love to hear about it.

* Additional reading - Brian Goetz on Java concurrency.

Questions, comments? We’ve got a spiffy comments section below just for that.

Duke_8 copy

5 Features In Java 8 That WILL Change How You Code - read more


fredForBlog

Takipi detects all your exceptions and errors and tells you why they happen. Even across multiple threads and machines. Installs in 1min. Less than 2% overhead - Try it free

duke

Already using Java 8? Try Takipi for Java 8

Tal Weiss

Posts Twitter

Tal is the CEO of Takipi. Tal has been designing scalable, real-time Java and C++ applications for the past 15 years. He still enjoys analyzing a good bug though, and instrumenting code. In his free time Tal plays Jazz drums.
  • Alexandre Verri

    Hello Tal, thank you for sharing this information. One thing that caught my attention is the synchronized method having a better performance compared to RWLock. I was expecting the opposite result.

    • http://www.takipi.com/ Tal Weiss

      Hey Alexandre,

      Thanks for the comment. I was pretty surprised by it as well.

      It seems that when the number of writers is fairly high then you actually get lower performance throughput than with a straight synchronized section. The results we’ve seen have been consistent. If some one else runs the test and sees something else, I’d love to learn about it.

      • Pierre Laporte

        RWLocks have a serious overhead compared to synchronized and can also suffer from readers/writers starvation, depending on the JDK version you use. As a rule of thumbs, you need the code in the read lock to execute for about 2000 clock
        cycles in order to win back the cost of using it.

        Note that your benchmark is flawed, you should post a link to this article on either the “Friends of jClarity” mailing list or “Mechanical Sympathy” to have all the details and rewrite if using Java Microbenchmark Harness.

        Also, there is a new lock class in Java8 (StampedLock) which is a better implementation than RWLock with less overhead, you should have a look at the new idioms introduced with this class.

        • http://www.takipi.com/ Tal Weiss

          Hi Pierre,

          Thanks for the comment. We’ll add StampedLock to the benchmark as well. You mentioned a flaw in the benchmark, I’d love to know how to improve it, so feel free to post here or shoot me a note to tal.weiss@takipi.com.

  • Akash

    Surprising to see that atomic is giving better results than volatile! … As atomic internally uses volatile varible to store value… Doesn’t really make sense :(