Avoid the queuing of dropped events by the primary gateway sender when the gateway sender is stopped
To be Reviewed By: July 16th, 2020
Authors: Alberto Gomez (alberto.gomez@est.tech)
Status: Draft | Discussion | Active | Dropped | Superseded
Superseded by: N/A
Related: N/A
Problem
Gateway senders drop all events received when they are stopped. Nevertheless, primary gateway senders, while stopped, store all events received in the tmpDroppedEvents
member variable of the AbstractGatewaySender
class. These events are stored so that they can be sent later (when the primary gateway sender is started) to the secondary gateway senders in order for them to remove those events from their queues. If it were not so, secondary gateway senders could have events in their queues that would never be removed.
This feature was implemented in - GEODE-4942Getting issue details... STATUS as a solution to avoid secondary gateway senders to leave un-drained events after GII.
This solution works well when stopped gateway senders are not to remain in that state for a long time, e.g., when they are stopped but in the process of starting. But, if a gateway sender is stopped to be left in that state for some time, the incoming events reaching the primary gateway sender will be stored in the mentioned member variable of AbstractGatewaySender
and could eventually provoke a heap exhaustion error. Moreover, dropped events stored while the gateway sender is stopped will not be queued by secondary gateway senders which makes the storing of the dropped events in the primary gateway sender unnecessary.
Stopping a gateway sender is an action that may be used to avoid the filling of gateway sender queues in long lasting split brain situations. But, given the current status of the implementation, it would not be effective because incoming events will still be stored by the primary gateway senders, using at least the same amount of memory (if not more if overflow to disk is configured) as the events queued by the sender when it is running, and with a very high risk of heap memory exhaustion.
Anti-Goals
As described above, dropped events in the primary gateway sender are stored in a member variable. It is out of the scope of this RFC to change how those events are stored.
Solution 1 (original proposal, deprecated)
The solution proposes to change the primary gateway sender so that it does not store dropped events when it is stopped explicitly (not while starting). The reason is that these events could never end in the queue of any secondary gateway sender and will use memory unnecessarily.
In order to do so, it is proposed to add a new boolean member variable (mustQueueDroppedEvents)
to the AbstractGatewaySender
that will tell if the primary gateway sender must store dropped events or not.
mustQueueDroppedEvents
must be set to false (do not store dropped events) in the primary and secondary gateway sender instances:- At gateway sender creation if the
--manual-start
option was used. - Right after stopping the gateway sender using the gfsh
stop gateway sender
command.
- At gateway sender creation if the
mustQueueDroppedEvents
must be set to true (store dropped events) in the primary and secondary gateway sender instances:- At gateway sender creation if the
--manual-start
option was not used or set to false. - Right before a
start gateway sender
gfsh command is executed.
- At gateway sender creation if the
The start gateway sender
and stop gateway sender
gfsh commands would be modified in order to set the value of mustQueueDroppedEvents
as follows:
- Code will be added at the end of the current
stop gateway sender
gfsh command which will setmustQueueDroppedEvents
to false (after the gateway senders have been stopped). - Code will be added at the beginning of the current
start gateway sender
gfsh command which will setmustQueueDroppedEvents
to true (before the gateway senders are started).
In case the GatewaySender
Java API is used to start/stop gateway senders, in order to get the new behavior, given that the scope of the start/stop methods is the VM on which it is invoked, it will be necessary to set the mustQueueDroppedEvents
accordingly (to true before starting all sender instances and to false after stopping all sender instances) on every VM. To set the value of the variable, the GatewaySender interface will offer the following new method: setMustQueueDroppedEvents(boolean mustQueue)
. If the new method is not used, the legacy behavior will prevail except if the gateway sender is started manually in which case dropped events will not be queued.
A draft PR of the solution can be found here: https://github.com/apache/geode/pull/5348
Changes and Additions to Public Interfaces
Two new methods: setMustQueueDroppedEvents(boolean)
and mustQueueDroppedEvents()
will be added to the GatewaySender
public interface.
Performance Impact
As the proposal implies changing the implementation of the start gateway sender
and stop gateway sender
gfsh commands to be done in two steps, these commands may be slightly slower although not significantly.
Backwards Compatibility and Upgrade Path
The proposal does not affect the rolling upgrade and has not impacts in the regular rolling upgrade process.
Solution 2 (new proposal, agreed after discussions on this RFC )
The solution consists of, instead of storing dropped events in `tmpDroppedEvents` to later send batch removal messages when the primary gateway sender is not started, try to send the batch removal message when the event to be dropped is received. That way, when the sender is stopped for a long time and there are events coming, the memory of the `AbstractGatewaySender` will not grow with entries in the `tmpDroppedEvents` member.
In order to send the batch removal message directly, the `eventProcessor` for the `AbstractGatewaySender` must have been created. If it is not yet created because the sender was created with manual start set to true, while receiving events to be dropped, they will be stored in `tmpDroppedEvents` as there is no other choice. Nevertheless, in order to consume less memory, the event stored could be a simplified event containing only the necessary information to handle it.
A draft PR of the solution can be found here: https://github.com/apache/geode/pull/5486
Changes and Additions to Public Interfaces
No changes.
Performance Impact
No impacts foreseen.
Backwards Compatibility and Upgrade Path
The proposal does not affect the rolling upgrade and has not impacts in the regular rolling upgrade process.
Prior Art
-
FAQ
Answers to questions you’ve commonly been asked after requesting comments for this proposal.
Errata
What are minor adjustments that had to be made to the proposal since it was approved?
43 Comments
Donal Evans
The changes to the
GatewaySender
interface (adding thesetMustQueueDroppedEvents()
and mustQueueDroppedEvents()
methods) constitute a change to a public API, so should be documented in the appropriate section of this RFC. Also, no mention is given to how this new behaviour will be implemented when the Java API is used to stop a gateway sender via theGatewaySender.stop()
method, which would lead to inconsistent default behaviour if not addressed.Alberto Gomez
Good points, Donal. Thanks!
I have updated the RFC describing how to use the new behavior in case the Java API is used. As you can see, there is no magic, but the one using the Java API can get the new behavior by making use of the new method to set the value for the new flag accordingly and doing the required coordination.
Jinmei Liao
I am not very familiar how gateway senders behave, but by the description of the problem here, this may be a naive question: instead of setting up a boolean, can we just check if the gateway sender is stopped or not before we put the event on the queue?
Alberto Gomez
Thanks Jinmei Liao .
Each gateway sender instance (gateway sender running in a member) has its own variable stating if the instance is stopped or not. Currently, if the sender instance is primary and the sender instance is stopped, the dropped event is queued. But this variable does not provide information about the status of the other instances. This is an excerpt of the code in
AbstractGatewaySender.java
What the proposed boolean variable is achieving is letting know the primary gateway sender instance if the gateway sender (not just the instance, but all the instances) are stopped. The variable is set to true by the stop gateway sender command in all gateway sender instances. Another name for the variable could be
allSenderInstancesStopped.
Without this variable, the primary gateway sender instance would have to contact each secondary gateway sender instance to query if they are stopped. This is not very efficient and also there could be race conditions that could make a dropped event to not be queued because the secondary sender was just stopped when queried but just a fraction of time ago could have been up and storing the event just received by the primary.
Xiaojian Zhou
I am the original author who introduce GEODE-4942 and GEODE-5087. I understand current tmpDroppedEvents contains the side-effect which is mentioned in this RFC. However, I don't think to change the APIs to introduce new parameter is a good idea to resolve it.
To enhance the original design of tmpDroppedEvents to workaround the side-effect, there are some easier ways. The simplest one is to set a timeout. Currently we did not distinguish if the eventProcessor is purposely shutdown or is in normal restarting. We can introduce a stopTime (long) as attribute, which will be set after
this.eventProcessor.isStopped(). Then before tmpDroppedEvents.add(clonedEvent), we can compare current timestamp with the saved stopTime,
if it's longer than 10 seconds, don't add to tmpDroppedEvents. I introduce the tmpDroppedEvents is only for restarting window, it should not
be long. 10 seocnds is way longer than necessary.
Xiaojian Zhou
I have a more mature idea to use the timeout to resolve this issue. I noticed AbstractGatewaySender has an attribute startTime, which will be set to current timestamp in start() of SerialGatewaySenderImpl and SerialAEQImpl. But ParallelGatewaySender never used it. We can make use of this attribute.
1) in start(), set startTime to be current timestamp as SerialGatewaySenderImpl does.
2) in stop(), set startTime to be (-1)*System.currentTimeMillis()
3) in distribute():
We only need to distinguish if the sender is down due to purposely shutdown or in the middle of restarting. Above code is enough to do it.
We can have more detail discuss if you like.
Xiaojian Zhou
Alberto Gomez I also created a draft Pull Request based on the simple solution. It will resolve the problem you mentioned. https://github.com/apache/geode/pull/5381
You can keep refining this fix.
This fix has minimum impact to current system and APIs.
Alberto Gomez
Xiaojian Zhou Thanks for your comments.
I find the solution solves the problem of not storing dropped events indefinitely when a sender is stopped. But I have some concerns:
Xiaojian Zhou
Alberto Gomez
Introducing a new stopTime is OK. I just find the startTime is never used and want to use it.
15 seconds is adjustable. I feel it's safe to increase even to 1 minute. I don't worry that much about the space used for the time being. We can enhance the item's size which are put into the tmpDroppedEvents. We only keep the tmpDroppedEvents to notify secondary queue to delete. So we might not need to clone the whole event's data structure. For example, during clone, we can set some fields to be null. Note: I means to clone a new event dedicated for tmpDroppedEvents and keep the current cloneEvent as is.
For your question 1, we have not found any bugs for that scenario yet. However, we are not supposed to shutdown one sender for long time. Either we restart it (which should finish within 15 seconds), or shutdown all senders (usually due to shutdown all members).
Alberto Gomez
Xiaojian Zhou
I think it would be a good idea to not clone the whole event to prevent using too much memory.
Regarding question 1, I have written a test case in which you can see that if you stop a sender and then start it while there are incoming operations, there could be dropped events not queued that will be received by the secondarygateway sender and never drained. If you run it several times, you'll see it fail from time to time.
This case is not handled by this solution.
Xiaojian Zhou
In my previous comment, there's a mistake. We cannot clone a simplified event, because the event will be enqueued when sender is restarted. For your test, I will take a look later.
Xiaojian Zhou
Alberto Gomez Your test failure will happen when the senders are started at the first time, when the startTime is initialized as 0. vm2 and vm4 could start in different speed, which caused different number of events are rejected at vm2 and vm4. In my reproduce, vm2 reject tailKey==100, 101, but vm4 rejected tailKay=100,101,102. If vm2 happened to be the secondary, then taikKey=102 will not be drained.
I have fixed it. See the latest code in pull request.
By the way, in your test code need to adjust a little bit. I have added your test case into the PR.
Alberto Gomez
Xiaojian Zhou Thanks for your reply. Again, the problem with this solution is that it does not fix completely the case when you start a sender.
If you start a sender right after it has been created (or within 15 seconds since the creation), the sender will queue dropped events for 15 or less seconds. But what if you first create the sender and more than 15 seconds later you start it? In this case the solution will not work. No dropped events will be queued and you could have events not drained.
Also, if you start a server that was previously stopped, you need to do it within the 15 seconds after you stopped the sender. Otherwise, no dropped events will be queued and you could have events not drained. I do not think we can assume that a stopped sender will always be restarted within the next 15 seconds. In fact, we have a use case in which we need to stop the gateway sender for a long time in order not queue more events to not fill up the heap.
You can verify this by adding sleeps of 15 seconds before sending puts in the new test case. You will see that the test case will fail sometimes.
Xiaojian Zhou
I know it might. However in general, I think nothing should be added to the queue if it's not running. If events get added to the secondary that aren't added to the primary, they should be dealt with in the first batch removal message.
Currently our batch removal message packed a key list. Only the events in the key list at secondary will be removed. We can enhance it to double check if there's any "too-old" events in secondary. Remove them if any. There're 2 alternative to identify an "too-old" event. 1) set a long timeout such as 1 hour. 2) set a boolean firstBatchRemoval to the batch removal message. If it's true, then remove all the events in secondary queue with smaller shadow keys than the smallest one in the batch.
Alberto Gomez
Xiaojian Zhou We can implement any of the alternatives you are proposing. Still I wonder if doing all this is not much more complex than the solution that I proposed originally. Plus we would have to address the issue I mentioned previously about, depending on the size of the objects, taking up too much memory due to storing 15 seconds of dropped events.
In my opinion, even if the public API needs to be changed (expanded), the original solution is much simpler and addresses the problem without any drawback.
Xiaojian Zhou
Above solution has minimum impact to current system. Your new API solution has more side effects than you thought.
Alberto Gomez
Could you please elaborate on the side effects I have not thought about?
Xiaojian Zhou
It could break backward compatibility. And any changes to public API will trigger many other components to be changed. Such as JMX and Restful API. Adding gfsh parameter is the last choice. We would rather to disable the whole tmpDroppedEvents feature to resolve your original issue.
My fix in this PR have dramatically reduce the chance of undrained event in secondary queue. Only could happen when user stopped one sender (not for restart) then wait for more than 15 seconds to manual restart. Even though, the worst case is the possibility of a few undrained events in secondary queue, and the possibility is way less than totally disabled tmpDroppedEvents.
As for manually stop one sender, it will always face the risk of missing data. We strongly recommend NOT to do it. You can either shutdown all senders, or restart one sender.
Alberto Gomez
Hi Xiaojian Zhou . The original solution does not break the backward compatibility because the changes in the API are an extension of the it - there are new methods available but you are not required to use them. Regarding gfsh, I am not proposing to add any new parameter. My proposal consists of modifying the implementation of the start and stop commands to use the new methods of the API to avoid the problems with tmpDroppedEvents.
Please, correct me if I am wrong but I have not seen in the JMX or the restful API a way to start or stop the gateway senders so I do not see how this proposal would impact them.
I am not against your proposal. I just think it has some drawbacks (you need timeouts which are always hard to tune and pose on the one configuring the system an extra burden), there are aspects which are not totally clear yet (the alternatives about how to solve the problem when starting servers) and the impacts in code and complexity are bigger. That's why I still consider the original solution better.
Finally, the proposal aims at solving the problems when starting or stopping all sender instances, not other cases. That's why the new logic is added to the start/stop gfsh commands in which all sender instances are stopped/started.
Xiaojian Zhou
Your solution cannot resolve the event undrained in secondary queue. When you manual start (or shutdown and restart using default setting), the tmpDroppedEvents is actually disabled. Then there's possibility that event will be undrained in secondary queue.
I have reproduced that using my version of ParallelWANPropagationLoopBackDUnitTest.java. You can comment out the lines to verify size of tmpDroppedEvents.
Alberto Gomez
That's because I set the new variable to true when manual start is true because I thought it would be less problematic.
We can initialize it to true and then you would have the default behavior equivalent to what we have today.
Jinmei Liao
Alberto Gomez In your solution, if a developer wants to use java API to stop a gateway sender, does he need to set the added boolean flag to a specific value? It's not very obvious to me what value I need to set it to be, so to me this added boolean flag does add API complexity and confusion.
All these seem to be some internal management of the queue sizes, so I would like to leave users out of it. Requiring users to know the details of the implementation in order to set a flag seems not user-friendly to me.
In your solution, you are using a loop to set the flag on all the members, so it's still possible that the secondary GWS (gateway sender) will get the notice later than the primary, so it could still possibly queue some events before it turns the boolean off, so that event could still not be drained.
As far as I understand, tmpDroppedEvents is used to store events that might have been queued up by the secondary GWS while the primary GWS is down. Ideally, if user didn't stop the secondary GWS but have the primary GWS stopped, we would have this tmpDroppedEvents store those events indefinitely until the primary starts again, but if we are concerned the variable will get too big, then setting a timeout seems like a reasonable solution. The problem of this approach is when user didn't stop all the secondary GWS after the timeout period after they stop the primary GWS, then we will need to ask ourselves: how likely is this to happen?
Alberto Gomez
Thanks a lot for your comments, Jinmei. Very appreciated.
I totally agree with you that the API extension proposed is not very user friendly and deals with internal management of queues that should be hidden to the user of the API. Due to how the tmpDroppedEvents feature implementation was done, I did not think I had a better choice, mainly because it is all about coordination between sender instances. I also have to say that I proposed to add it so that gfsh start/stop could make use of it and solve the problem described in the RFC and not thinking about Java API users. The latter clients could be left with the current behavior. With this proposal, the logic would be hidden inside the implementation of gfsh start and stop. Nonetheless if Java users would like to make use of it we could try to find a better name for the new methods/flag (for example: prepareToStopSender/prepareToStartSender or preStartSender/postStopSender) and add proper documentation so that it would not be impossible for the user to understand how to use the API. As this is a draft, I expected to have discussion on the namesbut for the sake of understanding what the methods/flag did in the proposal I decided to be very explicit with the name of the flag/method and probably less clear for the future user of the API.
Regarding what you comment about the loop, the idea is that before starting a sender, you must tell all sender instances they are going to be started. Once all instances have been notified (and not before as you can see in the implementation of the gfsh start sender proposal) they are sent the signal to be started. Events will not be queued in the secondary before it is started but the primary will start to put stopped events in tmpDroppedEvents as soon as it receives the change of the flag, so there is no possibility for the race condition you mention.
As I wrote in previous comments to Xiaojian Zhou, I do not think using a timeout is a good idea in general for solving problems and less so in this case in which, depending a lot in the type of traffic received, the value of the timeout could be very different and either use a lot of memory if the timeout was too high or leave undrained events if too short.
Besides, the timeout only solves the problem when stopping the sender. To solve the problem of not leaving undrained events when starting a sender, a couple of draft ideas were proposed by Xiaojian Zhou but I could foresee some problems already with them and it is clear that they add yet more complexity to this already complex feature.
For all of the above, I consider that solving the problem in the gfsh start and stop sender commands (and leaving the current behavior of start/stop gateway sender when used out of gfsh ) is the least bad solution. It is very simple in terms of code and solves perfectly the problem when gfsh is used.
Afterall it might be that what we really need is to see if there is a better way to drain event queues in the secondary that were dropped in the primary without storing the dropped events in the primary in the first place.
Jinmei Liao
If you are only set to solve the problem when gfsh commands are used, then your solution is ok. I think Gester is trying to solve the problem regardless whether gfsh is used to start/stop the GWSs.
Xiaojian Zhou
I agree to Jinmei. To write a private-used workaround is different story. To add a feature into the product for all users to use need to consider more. A feature should not assume user to call some APIs to do housekeeping in their application.
Actually, I am thinking to add some of Alberto's code into my proposal. For example, the mustQueueDroppedEvents and its setter. Then users can easily to work out a private workaround (such as using a function to call the setter on all members) if they are unhappy with our fix. The original GEODE-4942 did not fix well. I did not consider the scenario Albert mentioned. What I provided is an enhanced fix.
To write a complete java API and gfsh to change a status in all senders is a feature, which needs more work efforts and consideration.
Alberto Gomez
I agree with you guys and see valid points on your comment below.
As I told Jinmei, I aimed at solving the problem just for the gfsh case (taking into consideration the interaction with servers on different versions and failure cases like the ones you mentioned below) and to me, the solution was simple and good enough for most use cases.
To provide a complete fix for the java API I would explore the possibility of not storing dropped events by the primary at all. I am not sure if it is possible but you wrote a couple of ideas to solve the problem when starting the sender that may do it.
I did not want to go that far when I thought about fixing this problem but if you want we can have a discussion to exchange ideas to try to find the best solution next week - as this week I am on vacation. How does it sound?
Xiaojian Zhou
I have an idea to add your mustQueueDroppedEvents attribute and setMustQueueDroppedEvents() API to my PR. I will add a function to call setMustQueueDroppedEvents() on all members. However, I will put this function into dunit test as example. Then users can copy this function as a java-API in their application if they want to turn off/on tmpDroppedEvents.
If they don't care that much, my PR (using timeout) will serve the normal customers' use cases.
Actually, you can do above if you like. I think this is the best balance in current stage.
Xiaojian Zhou
Alberto Gomez
My proposal is also a draft workaround. It will help the situation in most use cases (if not all). And actually we can apply both solutions.
I am not against to introduce a new feature to set a status to all senders. However, there are the 4 major problems in your solution:
If considered all above 4 challenges, you will see why my proposal is simpler.
Alberto Gomez
So, if I understand correctly you are proposing to solve the problem in two phases:
mustQueueDroppedEvents
) via a function which will set the flag on all senders and will take care of retries, versions, etc. and update gfsh to use this API.Is this right?
Alberto Gomez
Xiaojian Zhou I had another idea based on your proposal with a timeout.
If we use the startTime and stopTime we can make sure that the solution works not only when senders are stopped (after 15 seconds no more dropped events will be queued) but also when starting them.
This solution will be good for Java API users and gfsh users and they will not be impacted.
You can check a draft PR here:
https://github.com/apache/geode/pull/5434
Alberto Gomez
Xiaojian Zhou
With this solution, when starting the gateway senders there is some very small chance that some event queued in the secondary gateway sender and dropped by the primary is not stored in the tmpDroppedEvents. I have seen the last test case of ParallelWANPropagationLoopBackDUnitTest in my PR fail 1 out of 350 executions.
Nevertheless, I think the solution is good enough until we decide to use the flags to report that all gateway sender instances are stopped.
Another thing to take into account is that this solution, when manually starting the gateway sender, will store events in tmpDroppedEvents for 15 seconds since creation. Should we in this case make the gateway not store events in tmpDroppedEvents until it is started?
Do you think we could go on to create a PR with this solution?
Xiaojian Zhou
Alberto Gomez
After discussed with the team, it will be a community development. So I will not fix it for the time-being, but I will help you if necessary.
There're following options you can choose:
1. completely introduce a java API with gfsh parameter to change all sender's status at one time. When you do this java API, you need to consider 2 negative cases: 1) retry if some member did not finish setting the status. 2) detected some servers are running in older version, then abort the API.
2. introduce the attribute and the setter, then add a function (let's name it testSetSenderStatustFunction) to call the setting in dunit test. Also create a dunit test to use gfsh to call testSetSenderStatustFunction. Then with this checkin, your application code can simulate the 2 test code to define a user function, let's name it AlbertoFunction, and in your application to use gfsh to call AlbertoFunction.
3. Use timeout to resolve the issue.
4. Use both option-2 and option3.
After you decided which option to choose, then you can have a new PR and I will review for you.
Alberto Gomez
Xiaojian Zhou Regarding option 2., are you thinking about something like what I proposed in my last commit in the original PR (https://github.com/apache/geode/pull/5348) in which a new function is created and it is called from gfsh? That draft PR is missing the retries and the checking of the version of the servers but apart from that, it can give you an idea of what it would look like.
What I do not get is the parameter for gfsh. I do not think any parameter is needed. The new function should always be called from gfsh. Otherwise, the tmpDroppedEvents will be indefinitely collected when the sender is stopped.
Xiaojian Zhou
Your PR 5348 is based on option1. If you want to change gfsh product code, you have to consider and implement all the negative cases I mentioned earlier.
My option2 means: you can add an internal function like you have done in PR, the attribute and a setter. That's all you can add into product.
Then in test code, you write a test case to verify using both java api and gfsh to call this function to change the status on all senders.
gfsh can call your own function. Jinmei knows better on that.
That's the option-2. In this case, you don't need to worry about the negative cases, because you did not change the product and have no impact to other users of geode.
In your own application, you can use the test code you wrote to do the setting. Of course, it's up to you to handle the potential negative cases.
Alberto Gomez
Xiaojian Zhou You are right, I was referring to option 1. not 2.
I will take some time to think about which one to implement. It would be option 1 or 3.
Thanks for your comments.
Alberto Gomez
Xiaojian Zhou , Jinmei Liao, I have thought of an alternative solution that is simpler than the ones we have right now and should solve all the problems described in the RFC.
I have created a draft pull request with it so that you can take a look and comment:
https://github.com/apache/geode/pull/5486
Xiaojian Zhou
I feel the solution is overall good. It only needs tiny adjustment. I have put comments in PR.
Alberto Gomez
Thanks. I have written a ticket GEODE-8491 - Getting issue details... STATUS to handle it.
Jinmei Liao
There is `ExecuteFunctionCommand` that will allow you to execute user functions on any members. I believe what Gester suggested Option 2 is your PR #5348, with the following changes:
With you PR #5348, Gester is concerned that calling the function in Stop/StartGatewaySenderCommand might still cause some events being undrained and we don't do enough testing to ensure that's not the case.
Alberto Gomez
Jinmei Liao Thanks a lot for the clarification. Now I fully understand Gester's Option 2.
Would I need to implement the function in any special way so that it can be invoked from gfsh? With the current implementation extending Function, when trying to execute from gfsh I got a "Function GatewaySenderManageFunction is not registered in member..." error.
In the case of Option 1 (sketched in PR #5348), if it was finally selected, we could add more tests to make sure that there are no events undrained.
Jinmei Liao
Alberto Gomez to register a function for member execution, call FunctionService.registerFunction in an initialization code block.
Xiaojian Zhou
Solution 2 can be simplified to be:
The tmpDroppedEvents was introduced to save the incoming events temporally before EventProcessor was created when sender is manually started at the first time. When EventProcessor is present (stopping or restarting a sender will keep the EventProcessor until the whole sender is garbage collected), no need to save incoming events into tmpDroppedEvents.
This proposal has minimum impact to current logic and has no backward compatibility concern.
Alberto Gomez
The solution finally implemented can be seen in the following JIRA ticket: https://issues.apache.org/jira/browse/GEODE-8491