Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Hanging Transactions not Related to Deadlock

Description

This situation can occur if user explicitly markups the transaction (esp Pessimistic Repeatable Read) and, for example, calls remote service (which may be unresponsive) after acquiring some locks. All other transactions depending on the same keys will hang.

...

Also there should be a screen in Web Console that will list all ongoing transactions in the cluster including the info as above.

Java Level Deadlocks

Description

This situation occurs if user or Ignite comes to a Java-level deadlock due to a bug in code - reverse order synchronized(mux1) {synchronized (mux2) {}}  sections, reverse order reentrant locks, etc.

...

Ignite Thread Pools Starvation

Description

This situation can occur if user submits tasks that recursively submit more tasks and synchronously wait for results. Jobs arrive to worker nodes and are queued forever since there are no free threads in public pool since all threads are waiting for job results.

...

Web Console should provide ability to cancel any task and job from UI.

Report

Timed out tasks and jobs should be reported on Web Console and reported to logs. We need to introduce new config property to set timeout for reported jobs.

Log record and Web Console should include:

  1. Master node ID
  2. Start time

GC Pauses

Description

When Ignite node suffers from GC pauses it is literally unresponsive for every other node in topology.

Detection and Solution

Very good solution with 2 native threads is described here 

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyIGNITE-6171

Report

Native threads should report GC pause to stdout and if possible to a logger instance. Of course, if policy is set to "kill the node" then output via log is not possible as native thread will stuck in safepoint and no killing and logging occur until safepoint is released.