To be Reviewed By: 14 March 2022
Authors: Mario Ivanac
Status: Draft | Discussion | Active | Dropped | Superseded
Superseded by: N/A
Related: N/A
Problem
When several gateway receivers have the same value for hostname-for-senders (for example when running Geode under kubernetes and a load balancer balances the load among the remote servers), it has been observed that number of connections in GW senders pool used for sending ping message is much greater then number of dispatcher threads, although in this case only one connection could be used (since destinations have same address and port ). There are 2 reasons for this behavior:
- Since all Ping tasks are triggered in parallel, generally each task will request same connection at the same time, and we will create new connection for each task.
- For example if we have configured only one dispatcher thread on local server, and remote site have five servers, pool will try to ping for example server0. It will open new connection trying to reach server0 but (because all servers are sharing VIP:PORT) will probably get some other, let's say server1. It will use this connection to ping server0 and distributed ping functionality will do the magic on the receiving side. However, GW sender pool will notice it has new endpoint now - server1 - and it will want to ping it as well. So, now we need to ping server0 and server1. And finally we could end up pinging all 5 servers, and actually we only need server0.
Anti-Goals
Solution
Solution for described problems is:
- introduce configurable option to gradually activate pinging toward destination. This can be accomplish by increasing initial delay of each ping task.
- PR with proposed solution: https://github.com/apache/geode/pull/7517
- For Ping task (which as prerequisite has defined destination endpoint), when sending ping message, in case connected endpoint is different than the destination endpoint, don't register this new endpoint.
- PR with proposed solution: https://github.com/apache/geode/pull/7515
Changes and Additions to Public Interfaces
Performance Impact
No impacts.
Backwards Compatibility and Upgrade Path
No impacts.
Prior Art
What would be the alternatives to the proposed solution? What would happen if we don’t solve the problem? Why should this proposal be preferred?
FAQ
Answers to questions you’ve commonly been asked after requesting comments for this proposal.
Errata
What are minor adjustments that had to be made to the proposal since it was approved?