IDIEP-70
Author
Sponsor
Created

  

Status
DRAFT


Motivation

Cache async operations invoke future listeners on Striped pool threads, which can cause deadlocks and/or reduce cache performance.

IgniteFuture fut = cache.putAsync(1, 1);
fut.listen(f -> {
    // Executes on Striped pool and deadlocks.
    cache.replace(1, 2);
});

Users are supposed to be aware of this and handle it manually, however:

  • This behavior is unexpected
  • Users should carefully read the docs to know about this
  • Handling this manually is verbose and error-prone

The problem is more pronounced in Ignite.NET:

  • async/await exists for a long time and most code bases are async
  • async/await sugar somewhat makes this less obvious
  • custom thread pools are less common
await cache.PutAsync(1, 1);
// Now we are on a Striped pool thread!

// CPU-heavy method blocks the stripe and cache ops are stalled.
RunSomething();


A similar problem exists for Compute. Async operation continuations are executed on the Public pool, which can lead to starvation there when all threads are taken up by continuation logic.

Description

  • Add IgniteConfiguration#asyncContinuationExecutor (of type Executor).
  • Use ForkJoinPool#commonPool by default (when null / not set).
  • Use this executor for all Cache and Compute async continuations

This fixes the issue in Java, .NET and C++, because thick integrations use direct JNI callbacks for Futures.

NOTE: This IEP is NOT related to scan query filters, cache entry processors, etc, which also run on Striped pool.

Risks and Assumptions

  • Some users may already have custom code to deal with the problem.
  • Some users run simple continuations that work fine on the striped/public pool.

Those users can force the old behavior with `IgniteConfiguration.setAsyncContinuationExecutor(Runnable::run)`.

Performance

Executing continuation on a different thread involves some overhead. Local benchmark with integer key and value shows ~6% drop (see JmhCacheAsyncListenBenchmark in the PoC).

In a real world workload the difference should be insignificant.

Benchmark                          Mode  Cnt      Score      Error  Units
JmhCacheAsyncListenBenchmark.put  thrpt   10  77859.584 ± 2071.196  ops/s (before)
JmhCacheAsyncListenBenchmark.put  thrpt   10  73393.986 ± 1336.420  ops/s (after)


Discussion Links

IEP thread: http://apache-ignite-developers.2346864.n4.nabble.com/IEP-70-Async-Continuation-Executor-td51775.html

Original discussions: 


Reference Links

PoC: https://github.com/apache/ignite/pull/8870

Tickets


Key Summary T Created Updated Due Assignee Reporter P Status Resolution
Loading...
Refresh

  • No labels