There are a few tools that can be used to monitor the GatewaySenders and GatewayReceivers:

JMX

JMX can be used to receive Geode severe, error or warning log messages as alerts as well as to access GatewaySenderMBean and GatewayReceiverMBean attributes

Alerts

Alert Level

JMX clients can be configured to receive Geode severe, error or warning log messages as alerts using a JMX NotificationListener.

The default is alert level is severe, but it is configurable. To change the alert level, invoke the DistributedSystemMBean changeAlertLevel operation.

Here are examples of modifying the alert level using JConsole and Java.

JConsole

To view the alert level:

View alert level using JConsole

To change the alert level:

Java

The attached AlertNotificationListener JMX client changes the alert level using code like:

JMXServiceURL url = new JMXServiceURL("service:jmx:rmi://localhost/jndi/rmi://localhost:1092/jmxrmi");
JMXConnector jmxConnector = JMXConnectorFactory.connect(url);
MBeanServerConnection mbs = jmxConnector.getMBeanServerConnection();
ObjectName distributedSystem = new ObjectName("GemFire:service=System,type=Distributed");
mbs.invoke(distributedSystem, "changeAlertLevel", new String[] {"warning"}, new String[] {"java.lang.String"})

Notifications

Once the alert level has been set to 'warning', then warning log messages will be received by any JMX NotificationListener in the form of Notifications.

Here are some example using JConsole and Java.

JConsole

Java

The attached AlertNotificationListener JMX client handles Notifications using code like:

public void handleNotification(Notification notification, Object handback) {
  System.out.println(toString(notification));
}

private String toString(Notification notification) {
  Map userData = (Map) notification.getUserData();
  return new StringBuilder()
    .append(getClass().getSimpleName())
    .append(" received alert [")
    .append("time stamp=")
    .append(new Date(notification.getTimeStamp()))
    .append("; member=")
    .append(userData.get(JMXNotificationUserData.MEMBER))
    .append("; level=")
    .append(userData.get(JMXNotificationUserData.ALERT_LEVEL))
    .append("; thread=")
    .append(userData.get(JMXNotificationUserData.THREAD))
    .append("; message=")
    .append(notification.getMessage())
    .toString();
}

This code will dump output like:

AlertNotificationListener received alert [time stamp=Mon Oct 30 16:28:14 PDT 2017; member=ln-1; level=warning; thread=Event Processor for GatewaySender_ny_2 tid=0x4b; message=ny: Could not connect.
AlertNotificationListener received alert [time stamp=Mon Oct 30 16:28:24 PDT 2017; member=ln-1; level=warning; thread=Event Processor for GatewaySender_ny_2 tid=0x4b; message=During normal processing, unsuccessfully dispatched 1 events (batch #0)
AlertNotificationListener received alert [time stamp=Mon Oct 30 16:28:38 PDT 2017; member=ln-1; level=warning; thread=AckReaderThread for : Event Processor for GatewaySender_ny_2 tid=0x75; message=Gateway Sender ParallelGatewaySender{id=ny,remoteDsId=1,isRunning =true} : Received ack for batch id 0 with one or more exceptions
AlertNotificationListener received alert [time stamp=Mon Oct 30 16:28:38 PDT 2017; member=ln-1; level=warning; thread=AckReaderThread for : Event Processor for GatewaySender_ny_2 tid=0x75; message=The event being processed when the BatchException occurred was:  SenderEventImpl[id=EventIDid=25bytes;threadID=0x10034|1;sequenceID=4;bucketId=52];action=0;operation=CREATE;region=/data;key=4;value=Trade[id=4; cusip=PVTL; shares=31; price=199.12570190429688];valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument [originalCallbackArg=null;originatingSenderId=2;recipientGatewayReceivers={1}];possibleDuplicate=true;creationTime=1509406104471;shadowKey= 165;timeStamp=1509406104471;acked=false;dispatched=false;bucketId=52]
AlertNotificationListener received alert [time stamp=Mon Oct 30 16:28:38 PDT 2017; member=ln-1; level=warning; thread=AckReaderThread for : Event Processor for GatewaySender_ny_2 tid=0x75; message=The event being processed when the BatchException occurred was:  SenderEventImpl[id=EventIDid=25bytes;threadID=0x10016|1;sequenceID=26;bucketId=22];action=0;operation=CREATE;region=/data;key=26;value=Trade[id=26; cusip=PVTL; shares=6; price=967.1017456054688];valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument [originalCallbackArg=null;originatingSenderId=2;recipientGatewayReceivers={1}];possibleDuplicate=true;creationTime=1509406104569;shadowKey= 135;timeStamp=1509406104569;acked=false;dispatched=false;bucketId=22]
AlertNotificationListener received alert [time stamp=Mon Oct 30 16:28:38 PDT 2017; member=ln-1; level=warning; thread=AckReaderThread for : Event Processor for GatewaySender_ny_2 tid=0x75; message=The event being processed when the BatchException occurred was:  SenderEventImpl[id=EventIDid=25bytes;threadID=0x10039|1;sequenceID=9;bucketId=57];action=0;operation=CREATE;region=/data;key=9;value=Trade[id=9; cusip=PVTL; shares=76; price=324.1545104980469];valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument [originalCallbackArg=null;originatingSenderId=2;recipientGatewayReceivers={1}];possibleDuplicate=true;creationTime=1509406104493;shadowKey= 170;timeStamp=1509406104493;acked=false;dispatched=false;bucketId=57]

Note: This example logs every Geode warning, not just ones from GatewaySenders.

GatewaySender Log Messages and Notifications of Interest

Startup

Remote Locator Is Not Running

When the remote locator is not running, the local locator will log a message like below. This message will be logged for each remote locator configured the remote-locators property that is unreachable.

[warning 2017/11/09 11:32:48.667 PST locator-ln <WAN Locator Discovery Thread> tid=0x54] Locator discovery task could not exchange locator information 10.118.20.102[10332] with localhost[10331] after 6 retry attempts. Retrying in 10,000 ms.

And the GatewaySender will log messages like below.

[warning 2017/11/09 11:39:44.314 PST ln-1 <Event Processor for GatewaySender_ny_0> tid=0x47] Remote locator host port information for remote site "1" is not available in local locator "localhost[10332]".
[severe 2017/11/09 11:39:44.317 PST ln-1 <Event Processor for GatewaySender_ny_0> tid=0x47] GatewaySender "ny" could not get remote locator information for remote site "1".

The attached AlertNotificationListener will log alerts like:

AllNotificationListener received notification[source=DistributedSystem(2); type=system.alert; sequence number=17; time stamp=1510364889713; message=Remote locator host port information for remote site "1" is not available in local locator "localhost[10332]".; userData={AlertLevel=warning, Member=ln-1, Thread=Event Processor for GatewaySender_ny_0 tid=0x47}]
AllNotificationListener received notification[source=DistributedSystem(2); type=system.alert; sequence number=19; time stamp=1510364889715; message=GatewaySender "ny" could not get remote locator information for remote site "1".; userData={AlertLevel=severe, Member=ln-1, Thread=Event Processor for GatewaySender_ny_0 tid=0x47}]
GatewaySender Cannot Connect to GatewayReceiver

When a GatewaySender cannot connect to a remote GatewayReceiver, a message like below is logged.

[warning 2017/11/09 09:28:44.977 PST ln-1 <Event Processor for GatewaySender_ny_3> tid=0x4c] ny: Could not connect.

The attached AlertNotificationListener will log an alert like:

AllNotificationListener received notification[source=DistributedSystem(2); type=system.alert; sequence number=9; time stamp=1510362355433; message=ny: Could not connect.; userData={AlertLevel=warning, Member=ln-1, Thread=Event Processor for GatewaySender_ny_3 tid=0x4c}]

Once the GatewaySender does connect to a remote GatewayReceiver, an info message like below is logged. This message contains the remote receiver's host and port as well as the number of failed attempts. Depending on what thread has been attempting to connect, it'll retry. For example, the AckReaderThread retries every second by default.

[info 2017/11/09 09:30:59.511 PST ln-1 <Event Processor for GatewaySender_ny_3> tid=0x4c] ny: Using Pooled Connection to 10.118.20.102:5382: Connection[10.118.20.102:5382]@439092143 after 1 failed connect attempts

or:

[info 2017/11/09 11:15:45.931 PST ln-1 <AckReaderThread for : Event Processor for GatewaySender_ny_3> tid=0x82] ny: Using Pooled Connection to 10.118.20.102:5470: Connection[10.118.20.102:5470]@1533322301 after 100 failed connect attempts
Ongoing
GatewaySender Loses Connection to GatewayReceiver

When GatewaySender loses connection to a remote GatewayReceiver, messages like below are logged.

[warning 2017/11/09 11:13:54.045 PST ln-1 <AckReaderThread for : Event Processor for GatewaySender_ny_3> tid=0x82] Pool unexpected Socket closed connection=Pooled Connection to 10.118.20.102:5382: Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts
[warning 2017/11/09 11:13:55.155 PST ln-1 <AckReaderThread for : Event Processor for GatewaySender_ny_3> tid=0x82] ny: Could not connect.

The attached AlertNotificationListener will log alerts like:

AllNotificationListener received notification[source=DistributedSystem(2); type=system.alert; sequence number=26; time stamp=1510364435961; message=Pool unexpected closed socket on server connection=Pooled Connection to 10.118.20.102:5181: Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts; userData={AlertLevel=warning, Member=ln-1, Thread=AckReaderThread for : Event Processor for GatewaySender_ny_3 tid=0x82}]
AllNotificationListener received notification[source=DistributedSystem(2); type=system.alert; sequence number=29; time stamp=1510364437068; message=ny: Could not connect.; userData={AlertLevel=warning, Member=ln-1, Thread=AckReaderThread for : Event Processor for GatewaySender_ny_3 tid=0x82}]
Batch Exceptions

Exceptions that occur on the GatewayReceiver while processing an event in a batch cause a warning like below to be logged on the GatewayReceiver. The stack trace shows a BatchException70 wrapping the real exception. In this case, a CacheWriterException occurred.

[warning 2017/11/09 16:28:46.265 PST ny-1 <ServerConnection on port 5342 Thread 4> tid=0x6a] Server connection from [identity(10.118.20.102(ln-1:12887)<v1>:1027,connection=1; port=56515]: Wrote batch exception: 
org.apache.geode.internal.cache.wan.BatchException70: Exception occurred while processing a batch on the receiver running on DistributedSystem with Id: 1, DistributedMember on which the receiver is running: 10.118.20.102(ny-1:12879)<v3>:1026
	at org.apache.geode.internal.cache.tier.sockets.command.GatewayReceiverCommand.cmdExecute(GatewayReceiverCommand.java:715)
	...
Caused by: org.apache.geode.cache.CacheWriterException: test cache writer exception
	at TestCacheWriter.process(TestCacheWriter.java:24)
	at TestCacheWriter.beforeCreate(TestCacheWriter.java:14)
	...

The same exception warning is logged on the GatewaySender along with a message identifying the event being processed like below.

[warning 2017/11/09 16:56:36.394 PST ln-1 <AckReaderThread for : Event Processor for GatewaySender_ny_3> tid=0x6a] Gateway Sender ParallelGatewaySender{id=ny,remoteDsId=1,isRunning =true} : Received ack for batch id 0 with one or more exceptions

[warning 2017/11/09 16:56:36.395 PST ln-1 <AckReaderThread for : Event Processor for GatewaySender_ny_3> tid=0x6a] A BatchException occurred processing events. Index of Array of Exception : 0
org.apache.geode.internal.cache.wan.BatchException70: Exception occurred while processing a batch on the receiver running on DistributedSystem with Id: 1, DistributedMember on which the receiver is running: 10.118.20.102(ny-1:13339)<v5>:1026
	at org.apache.geode.internal.cache.tier.sockets.command.GatewayReceiverCommand.cmdExecute(GatewayReceiverCommand.java:715)
	...
Caused by: org.apache.geode.cache.CacheWriterException: test cache writer exception
	at TestCacheWriter.process(TestCacheWriter.java:24)
	at TestCacheWriter.beforeCreate(TestCacheWriter.java:14)
	...

[warning 2017/11/09 16:56:36.407 PST ln-1 <AckReaderThread for : Event Processor for GatewaySender_ny_3> tid=0x6a] The event being processed when the BatchException occurred was:  SenderEventImpl[id=EventIDid=25bytes;threadID=0x10030|1;sequenceID=0;bucketId=48];action=0;operation=CREATE;region=/data;key=0;value=Trade[id=0; cusip=PVTL; shares=32; price=575.0191650390625];valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument [originalCallbackArg=null;originatingSenderId=2;recipientGatewayReceivers={1}];possibleDuplicate=false;creationTime=1510275395589;shadowKey= 161;timeStamp=1510275395587;acked=false;dispatched=false;bucketId=48]

The attached AlertNotificationListener receives these GatewaySender warnings as notifications like:

AlertNotificationListener received notification[source=DistributedSystem(2); type=system.alert; sequence number=18; time stamp=1510275396393; message=Gateway Sender ParallelGatewaySender{id=ny,remoteDsId=1,isRunning =true} : Received ack for batch id 0 with one or more exceptions; userData={AlertLevel=warning, Member=ln-1, Thread=AckReaderThread for : Event Processor for GatewaySender_ny_3 tid=0x6a}]
AlertNotificationListener received notification[source=DistributedSystem(2); type=system.alert; sequence number=20; time stamp=1510275396394; message=A BatchException occurred processing events. Index of Array of Exception : 0; userData={AlertLevel=warning, Member=ln-1, Thread=AckReaderThread for : Event Processor for GatewaySender_ny_3 tid=0x6a}]
AlertNotificationListener received notification[source=DistributedSystem(2); type=system.alert; sequence number=22; time stamp=1510275396397; message=The event being processed when the BatchException occurred was: SenderEventImpl[id=EventIDid=25bytes;threadID=0x10030|1;sequenceID=0;bucketId=48];action=0;operation=CREATE;region=/data;key=0;value=Trade[id=0; cusip=PVTL; shares=32; price=575.0191650390625];valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument [originalCallbackArg=null;originatingSenderId=2;recipientGatewayReceivers={1}];possibleDuplicate=false;creationTime=1510275395589;shadowKey= 161;timeStamp=1510275395587;acked=false;dispatched=false;bucketId=48]; userData={AlertLevel=warning, Member=ln-1, Thread=AckReaderThread for : Event Processor for GatewaySender_ny_3 tid=0x6a}]

MBean Attributes

Accessing GatewaySender and GatewayReceiver Attributes

JMX clients can also access GatewaySenderMBean and GatewayReceiverMBean attributes.

Here are some example using JConsole and Java.

JConsole

GatewaySender Attributes

To view GatewaySender attributes:

GatewayReceiver Attributes

To view GatewayReceiver attributes:

Java

GatewaySender Attributes

The attached DumpGatewaySenders JMX client dumps GatewaySender attributes using code like:

private void dumpGatewaySenders() throws Exception {
  ObjectName patternName = ObjectName.getInstance("GemFire:service=GatewaySender,gatewaySender=*,type=Member,member=*");
  Set<ObjectName> objectNames = this.mbs.queryNames(patternName, null);
  System.out.println("The distributed system contains the following " + objectNames.size() + " senders:");
  for (ObjectName objectName : objectNames) {
    System.out.println("=======================================================");
    System.out.println(objectName);
    System.out.println();
    System.out.println("Configuration Attributes:");
    showAttributes(objectName, getGatewaySenderConfigurationAttributes());
    System.out.println();
    System.out.println("Queue Attributes:");
    showAttributes(objectName, getGatewaySenderQueueAttributes());
  }
}

private void showAttributes(ObjectName objectName, List<String> attributesToDump) throws Exception {
  MBeanInfo info = this.mbs.getMBeanInfo(objectName);
  MBeanAttributeInfo[] attributes = info.getAttributes();
  for (MBeanAttributeInfo attribute : attributes) {
    if (attributesToDump.isEmpty() || attributesToDump.contains(attribute.getName())) {
      Object value = null;
      try {
        value = this.mbs.getAttribute(objectName, attribute.getName());
      } catch (Exception e) {} /** throws NPE if the value is null **/
      System.out.println("\t" + attribute.getName() + "->" + valueAsString(value));
    }
  }
}

This code will dump output like below. The Configuration Attributes are the static configuration attributes. The Queue Attributes are the dynamic queue attributes.

The distributed system contains the following 2 senders:
=======================================================
GemFire:service=GatewaySender,gatewaySender=ny,type=Member,member=ln-1
Configuration Attributes:
	GatewayEventFilters->null
	SenderId->ny
	SocketBufferSize->524288
	BatchConflationEnabled->false
	PersistenceEnabled->false
	DiskSynchronous->true
	RemoteDSId->1
	ManualStart->false
	OverflowDiskStoreName->null
	MaximumQueueMemory->100
	SocketReadTimeout->0
	BatchTimeInterval->1000
	AlertThreshold->100
	DispatcherThreads->5
	OrderPolicy->KEY
	Parallel->true
	BatchSize->100
Queue Attributes:
	EventQueueSize->219
	Primary->true
	Connected->true
	Running->true
	Paused->false
	EventsReceivedRate->10701.0
	EventsQueuedRate->10701.0
	TotalEventsConflated->0
	BatchesDispatchedRate->54.0
	AverageDistributionTimePerBatch->358332
	TotalBatchesRedistributed->1655
	EventsExceedingAlertThreshold->0
	GatewayReceiver->server-2:5334
=======================================================
GemFire:service=GatewaySender,gatewaySender=ny,type=Member,member=ln-2
  ...
GatewayReceiver Attributes

The attached DumpGatewayReceivers JMX client dumps GatewayReceiver attributes using code like:

private void dumpGatewayReceivers() throws Exception {
  ObjectName patternName = ObjectName.getInstance("GemFire:service=GatewayReceiver,type=Member,member=*");
  Set<ObjectName> objectNames = this.mbs.queryNames(patternName, null);
  System.out.println("The distributed system contains the following " + objectNames.size() + " receivers:");
  for (ObjectName objectName : objectNames) {
    System.out.println("=======================================================");
    System.out.println(objectName);
    System.out.println();
    System.out.println("Configuration Attributes:");
    showAttributes(objectName, getGatewayReceiverConfigurationAttributes());
    System.out.println();
    System.out.println("Connection Attributes:");
    showAttributes(objectName, getGatewayReceiverConnectionAttributes());
  }
}

This code will dump output like below. The Configuration Attributes are the static configuration attributes. The Connection Attributes are the dynamic connection attributes.

The distributed system contains the following 1 receivers:
=======================================================
GemFire:service=GatewayReceiver,type=Member,member=ny-1
Configuration Attributes:
	Port->5334
	BindAddress->
	SocketBufferSize->524288
	MaximumTimeBetweenPings->60000
	StartPort->5500
	EndPort->5500
	GatewayTransportFilters->null
Connection Attributes:
	Running->true
	ConnectionLoad->0.0
	LoadPerConnection->0.0
	QueueLoad->0.0
	LoadPerQueue->1.0
	DuplicateBatchesReceived->0
	OutoforderBatchesReceived->0
	ClientConnectionCount->12
	ConnectionThreads->0
	GetRequestAvgLatency->0
	GetRequestRate->0.0
	PutRequestAvgLatency->0
	PutRequestRate->0.0
	ThreadQueueSize->0
	TotalConnectionsTimedOut->0
	TotalFailedConnectionAttempts->0
	TotalReceivedBytes->49850096
	TotalSentBytes->145416
	UpdateRequestsRate->0.0
	DestroyRequestsRate->0.0
	EventsReceivedRate->0.0
	CreateRequestsRate->0.0
	NumGateways->2
	AverageBatchProcessingTime->1
	ConnectedGatewaySenders->[server1(ln-2:54496)<v2>:1027, server2(ln-1:54472)<v1>:1026]

Important GatewaySender and GatewayReceiver Attributes

GatewaySender Attributes

AttributeDescription
RunningWhether or not the GatewaySender is running. If it is running, then events will be queued. This does not mean the GatewaySender is connected to a remote site.
ConnectedWhether or not the sender is connected to the remote receiver.
EventQueueSizeThe size of the sender event queue. This attribute should remain fairly steady. If it is continuously growing, then the put rate into the queue is faster than the take rate from the queue.
TotalBatchesRedistributedThe number of batches redistributed. This attribute should be 0 most of the time. If it is a flat-line above 0, then it generally means the connection has been lost.


GatewayReceiver Attributes

AttributeDescription
RunningWhether or not the GatewayReceiver is running. If it is running, then remote GatewaySenders can connect to it.
NumGatewaysThe number of remote GatewaySenders connected to this GatewayReceiver. If no remote GatewaySenders are joining or leaving, then this attribute should remain steady.
ConnectedGatewaySendersThe actual remote GatewaySenders connected to this GatewayReceiver. If no remote GatewaySenders are joining or leaving, then this attribute should remain steady.
ClientConnectionCountThe actual number of remote Gateway Sender threads connected to this GatewayReceiver. If no remote GatewaySenders are joining or leaving, then this attribute should remain steady.

gfsh

gfsh can be used to dump most of the above GatewaySender and GatewayReceiver statistics. The GatewaySender statistics include status, queued events and receiver location. The GatewayReceiver statistics include sender count and senders connected (although this data is a String[] and not printed correctly).

To log these statistics using gfsh, execute the list gateways command like:

gfsh>list gateways
GatewaySender
GatewaySender Id |            Member            | Remote Cluster Id |   Type   | Status  | Queued Events | Receiver Location
---------------- | ---------------------------- | ----------------- | -------- | ------- | ------------- | ------------
ny               | server1(ln-1:54472)<v1>:1026 | 1                 | Parallel | Running | 505           | server1:5334
ny               | server2(ln-2:54496)<v2>:1027 | 1                 | Parallel | Running | 495           | server2:5334

GatewayReceiver
           Member            | Port | Sender Count | Sender's Connected
---------------------------- | ---- | ------------ | ---------------------------
server1(ln-1:54472)<v1>:1026 | 5224 | 2            | [Ljava.lang.String;@d5add54
server2(ln-2:54496)<v2>:1027 | 5395 | 3            | [Ljava.lang.String;@96aaf8a

Geode Statistics

Geode collects GatewaySender and GatewayReceiver statistics in the statistic archive file. These can be viewed by vsd and also by using org.apache.geode.internal.SystemAdmin. These statistics are similar to those provided by the JMX MBeans.

Note: The Geode enable-time-statistics property must be true to see any time statistics.

SystemAdmin

Here is an example of using org.apache.geode.internal.SystemAdmin to dump GatewaySender statistics:

java -classpath $GEODE/lib/geode-dependencies.jar org.apache.geode.internal.SystemAdmin stats gatewaySenderStats-ny:GatewaySenderStatistics -prunezeros -archive=server-ln-1/cacheserver.gfs
[info] Found 1 instances matching "gatewaySenderStats-ny:GatewaySenderStatistics":
gatewaySenderStats-ny, 77084, GatewaySenderStatistics: "2017/11/01 14:58:30.881 PDT" samples=199
  eventsReceived operations/sec: samples=198 min=0 max=6667 average=3999.18 stddev=2890.73 last=0
  eventsQueued operations/sec: samples=198 min=0 max=6667 average=3999.18 stddev=2890.74 last=0
  eventQueueTime nanoseconds/sec: samples=198 min=0 max=72906046 average=31734257.08 stddev=21363193.38 last=0
  eventQueueSize operations: samples=199 min=0 max=112681 average=13049.07 stddev=29332.69 last=0
  eventsDistributed operations/sec: samples=198 min=0 max=43761.24 average=2040.69 stddev=4131.01 last=0
  batchDistributionTime nanoseconds/sec: samples=198 min=0 max=4686581360 average=49434147.18 stddev=402748256.61 last=0
  batchesDistributed operations/sec: samples=198 min=0 max=437.56 average=20.46 stddev=41.29 last=0
  batchesRedistributed operations/sec: samples=198 min=0 max=105 average=15.46 stddev=35.49 last=0

Here is an example of using org.apache.geode.internal.SystemAdmin to dump GatewayReceiver statistics:

 java -classpath $GEODE/lib/geode-dependencies.jar org.apache.geode.internal.SystemAdmin stats :GatewayReceiverStatistics -prunezeros -archive=server-ny-1/cacheserver.gfs
[info] Found 1 instances matching ":GatewayReceiverStatistics":
boglesbymac-2-0.0.0.0/0.0.0.0:5207, 77128, GatewayReceiverStatistics: "2017/11/01 15:00:58.794 PDT" samples=40
  eventsReceived operations/sec: samples=39 min=0 max=105289.42 average=9829.61 stddev=18704.98 last=0
  updateRequest operations/sec: samples=39 min=0 max=105776.45 average=9843.46 stddev=18784.48 last=0
  processBatchRequests operations/sec: samples=39 min=0 max=1052.89 average=98.54 stddev=186.93 last=0
  readProcessBatchRequestTime nanoseconds/sec: samples=39 min=0 max=5942678.64 average=310920.62 stddev=993112.06 last=0
  processBatchTime nanoseconds/sec: samples=39 min=0 max=12642215246.51 average=720349961.52 stddev=2555877058.88 last=0
  processBatchResponses operations/sec: samples=39 min=0 max=1062.87 average=98.77 stddev=188.2 last=0
  writeProcessBatchResponseTime nanoseconds/sec: samples=39 min=0 max=360193653.35 average=16763642.25 stddev=63235628.08 last=0
  batchSize bytes/sec: samples=39 min=0 max=17275122.75 average=1613351.64 stddev=3069615.1 last=0
  currentClientConnections sockets: samples=40 min=0 max=10 average=9.62 stddev=1.75 last=0
  currentClients clients: samples=40 min=0 max=2 average=1.93 stddev=0.35 last=0
  abandonedReadRequests requests/sec: samples=39 min=0 max=4.98 average=0.13 stddev=0.8 last=0
  receivedBytes bytes/sec: samples=39 min=0 max=17293021.96 average=1615026.33 stddev=3071744.47 last=0
  sentBytes bytes/sec: samples=39 min=0 max=37200.6 average=3456.81 stddev=6586.88 last=0
  messagesBeingReceived messages: samples=40 min=0 max=10 average=0.75 stddev=2.58 last=0
  messageBytesBeingReceived bytes: samples=40 min=0 max=164194 average=12356.85 stddev=42493.17 last=0
  loadPerConnection load: samples=40 min=0 max=0 average=0 stddev=0 last=0
  loadPerQueue load: samples=40 min=1 max=1 average=1 stddev=0 last=1

Note: The org.apache.geode.internal.SystemAdmin stats command has other options. Execute the command below to see all available options.

java -classpath $GEODE/lib/geode-dependencies.jar org.apache.geode.internal.SystemAdmin stats -h

GatewayEventFilter

A GatewayEventFilter can be used to process individual events and collect statistics like:

Some of these overlap the existing Geode statistics. The queue time, maximum, minimum and per event statistics are the most interesting.

There are several ways to track these statistics including:

I attached examples of both of these to this document.

TimingGatewayEventFilter using in-memory statistics

The attached TimingGatewayEventFilter using the in-memory statistics periodically logs output like:

GatewaySenderQueueStatistics[receivedEvents=40; transmittedEvents=0; acknowledgedEvents=0; totalQueueTime=0; minimumQueueTime=9223372036854775807; maximumQueueTime=0; queueTimePerEvent=0; totalTransmitTime=0; minimumTransmitTime=9223372036854775807; maximumTransmitTime=0; transmitTimePerEvent=0]
GatewaySenderQueueStatistics[receivedEvents=4740; transmittedEvents=4380; acknowledgedEvents=4280; totalQueueTime=547035; minimumQueueTime=0; maximumQueueTime=636; queueTimePerEvent=124; totalTransmitTime=173661; minimumTransmitTime=5; maximumTransmitTime=329; transmitTimePerEvent=40]
GatewaySenderQueueStatistics[receivedEvents=14972; transmittedEvents=14580; acknowledgedEvents=14580; totalQueueTime=1336783; minimumQueueTime=0; maximumQueueTime=636; queueTimePerEvent=91; totalTransmitTime=215983; minimumTransmitTime=1; maximumTransmitTime=329; transmitTimePerEvent=14]
GatewaySenderQueueStatistics[receivedEvents=37003; transmittedEvents=36380; acknowledgedEvents=36380; totalQueueTime=2406136; minimumQueueTime=0; maximumQueueTime=636; queueTimePerEvent=66; totalTransmitTime=269442; minimumTransmitTime=0; maximumTransmitTime=329; transmitTimePerEvent=7]
GatewaySenderQueueStatistics[receivedEvents=60357; transmittedEvents=59880; acknowledgedEvents=59780; totalQueueTime=3515190; minimumQueueTime=0; maximumQueueTime=636; queueTimePerEvent=58; totalTransmitTime=301761; minimumTransmitTime=0; maximumTransmitTime=329; transmitTimePerEvent=5]
GatewaySenderQueueStatistics[receivedEvents=86310; transmittedEvents=85680; acknowledgedEvents=85680; totalQueueTime=4702961; minimumQueueTime=0; maximumQueueTime=636; queueTimePerEvent=54; totalTransmitTime=332158; minimumTransmitTime=0; maximumTransmitTime=329; transmitTimePerEvent=3]
GatewaySenderQueueStatistics[receivedEvents=112212; transmittedEvents=111880; acknowledgedEvents=111880; totalQueueTime=5862646; minimumQueueTime=0; maximumQueueTime=636; queueTimePerEvent=52; totalTransmitTime=363840; minimumTransmitTime=0; maximumTransmitTime=329; transmitTimePerEvent=3]
GatewaySenderQueueStatistics[receivedEvents=138400; transmittedEvents=137680; acknowledgedEvents=137580; totalQueueTime=7020883; minimumQueueTime=0; maximumQueueTime=636; queueTimePerEvent=50; totalTransmitTime=394095; minimumTransmitTime=0; maximumTransmitTime=329; transmitTimePerEvent=2]
GatewaySenderQueueStatistics[receivedEvents=164031; transmittedEvents=163480; acknowledgedEvents=163480; totalQueueTime=8189469; minimumQueueTime=0; maximumQueueTime=636; queueTimePerEvent=50; totalTransmitTime=423705; minimumTransmitTime=0; maximumTransmitTime=329; transmitTimePerEvent=2]
GatewaySenderQueueStatistics[receivedEvents=189065; transmittedEvents=188580; acknowledgedEvents=188580; totalQueueTime=9372220; minimumQueueTime=0; maximumQueueTime=636; queueTimePerEvent=49; totalTransmitTime=461048; minimumTransmitTime=0; maximumTransmitTime=329; transmitTimePerEvent=2]
GatewaySenderQueueStatistics[receivedEvents=213559; transmittedEvents=212980; acknowledgedEvents=212980; totalQueueTime=10491244; minimumQueueTime=0; maximumQueueTime=636; queueTimePerEvent=49; totalTransmitTime=492301; minimumTransmitTime=0; maximumTransmitTime=329; transmitTimePerEvent=2]
GatewaySenderQueueStatistics[receivedEvents=239621; transmittedEvents=239080; acknowledgedEvents=239080; totalQueueTime=11647352; minimumQueueTime=0; maximumQueueTime=636; queueTimePerEvent=48; totalTransmitTime=524223; minimumTransmitTime=0; maximumTransmitTime=329; transmitTimePerEvent=2]
GatewaySenderQueueStatistics[receivedEvents=265541; transmittedEvents=264980; acknowledgedEvents=264880; totalQueueTime=12802016; minimumQueueTime=0; maximumQueueTime=636; queueTimePerEvent=48; totalTransmitTime=557715; minimumTransmitTime=0; maximumTransmitTime=329; transmitTimePerEvent=2]
GatewaySenderQueueStatistics[receivedEvents=289345; transmittedEvents=288880; acknowledgedEvents=288880; totalQueueTime=13924716; minimumQueueTime=0; maximumQueueTime=636; queueTimePerEvent=48; totalTransmitTime=587959; minimumTransmitTime=0; maximumTransmitTime=329; transmitTimePerEvent=2]
GatewaySenderQueueStatistics[receivedEvents=315722; transmittedEvents=314880; acknowledgedEvents=314880; totalQueueTime=15082527; minimumQueueTime=0; maximumQueueTime=636; queueTimePerEvent=47; totalTransmitTime=619324; minimumTransmitTime=0; maximumTransmitTime=329; transmitTimePerEvent=1]
...
GatewaySenderQueueStatistics[receivedEvents=1000000; transmittedEvents=1000000; acknowledgedEvents=1000000; totalQueueTime=45622887; minimumQueueTime=0; maximumQueueTime=1102; queueTimePerEvent=45; totalTransmitTime=1461106; minimumTransmitTime=0; maximumTransmitTime=329; transmitTimePerEvent=1]

TimingGatewayEventFilter using custom Geode statistics

The custom Geode statistics generated by the attached TimingGatewayEventFilter can be dumped using org.apache.geode.internal.SystemAdmin like:

 java -classpath $GEODE/lib/geode-dependencies.jar org.apache.geode.internal.SystemAdmin stats gatewaySenderQueueStatistics-ny:GatewaySenderQueueStatistics -prunezeros -archive=server-ln-1/cacheserver.gfs
[info] Found 1 instances matching "gatewaySenderQueueStatistics-ny:GatewaySenderQueueStatistics":
gatewaySenderQueueStatistics-ny, 78175, GatewaySenderQueueStatistics: "2017/11/01 16:19:09.369 PDT" samples=127
  receivedEvents events/sec: samples=126 min=0 max=15808 average=11696.45 stddev=6229.44 last=0
  transmittedEvents events/sec: samples=126 min=0 max=16000 average=11696.46 stddev=6234.89 last=0
  acknowledgedEvents events/sec: samples=126 min=0 max=15900 average=11696.47 stddev=6236.99 last=0
  minimumQueueTime milliseconds: samples=127 min=0 max=4 average=0.03 stddev=0.35 last=0
  maximumQueueTime milliseconds: samples=127 min=0 max=1054 average=658.16 stddev=209.79 last=1054
  totalQueueTime milliseconds/sec: samples=126 min=0 max=677995.01 average=500487.92 stddev=255838.05 last=0
  minimumTransmitTime milliseconds: samples=127 min=0 max=201 average=1.66 stddev=17.84 last=0
  maximumTransmitTime milliseconds: samples=127 min=0 max=309 average=298.44 stddev=54.83 last=309
  totalTransmitTime milliseconds/sec: samples=126 min=0 max=132408.59 average=12257.8 stddev