Repair Async API
Repair used to be invoked through sync JMX interface, but since repair takes time to finish, JMX connection timeout happens sometimes. So CASSANDRA-4767 added asynchronous repair API which once invoked users can track repair progress through JMX notification.
Repair JMX Notification
Repair JMX Notification is sent from StorageService MBean(org.apache.cassandra.db:type=StorageService).
Before you run repair, you should subscribe to receive JMX notification otherwise you may miss some of messages.
Repair JMX Notification contains the following.
type |
"repair" |
message |
repair status message |
user data |
int array containing command number and repair status |
message is repair status message like "Starting repair ..." or error message.
user data is int array of 2 elements. The first element is command number which is assigned uniquely when repair is invoked through async API. You can obtain command number as return value of async APIs. The second element is repair status number as shown below.
0 |
STARTED |
repair command started |
1 |
SESSION_SUCCESS |
repair session (repair for one range in a keyspace) succeeded |
2 |
SESSION_FAILED |
repair session failed |
3 |
FINISHED |
repair command finished |
(In the code, these are defined as ActiveRepairService.Status enum.)
nodetool repair command also uses these status to track repair progress.
Further improvement
Still, the granuality of tracking repair status is large. Repair involves several nodes who do validation compaction and file streaming. Each of those are monitored through nodetool compactionstat and nodetool netstat on each node.
Possible solution to track the whole repair process is to Repair tracing.