Import ConcurrecyKit for atomics and containers

What does this mean for platform support?

Currently the only difference is MIPS. There is now native PPC64LE support and GCC builtin Aarch64 support. There are plans to make the Aarch64 native inline assembler as well as to add native MIPS support. This is per sbahra.

What atomic/concurrency features would we use? What would the usage look like?

We should replace all of our load/store add/sub/cas functions with code from CK. CK also has many extra types of synchronization that we do not have. We need to evaluate their usage on a case by case basis.

What containers would we use? What would the usage look like?

Immediately I see us replacing our freelist with ck_stack. Other containers we should take on a case by case basis.

How does using CK containers differ from using (for example) STL containers?

Memory allocation is different. For example, ck_ht (hash table): It takes an allocator, but it additionally puts all the malloc/free usage into one function (ck_ht_gc) so that we can control what thread this runs in and how often and how much work it does: http://concurrencykit.org/doc/ck_ht_gc.html

The current consensus is that we can't use C++11 concurrency because compiler support is not there in RHEL 6. Need to confirm how much C++11 gcc 4.4.7 supports https://gcc.gnu.org/gcc-4.4/cxx0x_status.html.

How does performance compare? (GCC builtins vs inline asm)

<sbahra> The inline stuff is much better for multiple reasons.
<sbahra> First off, in terms of correctness, the atomicity semantics of volatile variables is actually a silent contract. There is no standard document that people rely on.
<sbahra> In rare cases (for example, bit fields and so forth), this can be violated without clear warning.
<sbahra> Second, especially in terms of performance, the atomic operations in GCC are extremely heavy-weight.
<sbahra> Most of them imply a full memory barrier, even though a lot of data structures / operations / etc... will not need them.
<sbahra> And then the memory barrier interface itself consists of one thing: __sync_synchronize, a full barrier. On RMO architectures (such as aarch64), this means that
even store / load barriers will emit full blown barriers on top of the barriers already emitted by the atomic operations.

Space shortcuts

Page tree

What does this mean for platform support?

What atomic/concurrency features would we use? What would the usage look like?

What containers would we use? What would the usage look like?

How does using CK containers differ from using (for example) STL containers?

How does performance compare? (GCC builtins vs inline asm)