To be Reviewed By: November 16th, 2020

Authors: Mario Ivanac

Status: Draft | Discussion | Active | Dropped | Superseded

Superseded by: N/A

Related: N/A

Problem

Currently, we can configure gateway sender, only at creation of gateway sender.

So if we want to modify any gateway sender attribute, we need to perform following steps:

  • detach gateway sender from region
  • destroy gateway sender
  • create gateway sender with modified attributes
  • attach gateway sender to region

Performing all these steps while traffic is ongoing, will result with data incosistency.


Due to that, we would like to introduce new command "alter gateway-sender" which would update some parameters in runtime. For now parameters of interest are:

  • alert-threshold
  • batch-size
  • batch-time-interval
  • group-transaction-events
  • gateway-event-filter


Anti-Goals


Solution

Implement new command "alter gateway-sender" which will update gateway senders parameters in runtime. Parameters will be updated for gateway senders which have same ID as specified in command.

Current proposal will cover only changes for following parameters:

  • alert-threshold
  • batch-size
  • batch-time-interval
  • group-transaction-events
  • gateway-event-filter


After command is executed, changes will be applied to specified gateway senders (ID), on specified (or all) members, and also in cluster configuration.

To avoid any interference, dispatch thread will be paused in moment when changes are applied for sender on each member (similar as it is done for "load-balance gateway-sender" command).


PR with proposed solution: https://github.com/apache/geode/pull/5630

Changes and Additions to Public Interfaces

New methods setAlertThreshold(), setBatchSize(), setBatchTimeInterval(), setGroupTransactionEvents(), setGatewayEventFilters() will be added to GatewaySender.

Performance Impact

None

Backwards Compatibility and Upgrade Path

No upgrade or backwards compatibility issues.

Prior Art

What would be the alternatives to the proposed solution? What would happen if we don’t solve the problem? Why should this proposal be preferred?

FAQ

Answers to questions you’ve commonly been asked after requesting comments for this proposal.

Errata

What are minor adjustments that had to be made to the proposal since it was approved?

  • No labels

11 Comments

  1. Mario Ivanac I don't know if the problem statement is clear enough as to why you want to change these at runtime. My concern around the properties:

    • group-transaction-events
    • gateway-event-filter
    • gateway-transport-filter

    I don't know if those are something we should just change without at least a "offline" mode... As we a possibly dealing with logic that we should not be changing at runtime whilst the Gateway Sender is live. I mean, the transport and event filters are list (ordered) of filters. There is nothing in the alter command that would explain to me how you intend to handled the removal of a single filter or is this a "all-or-nothing" approach?

    Also, maybe you start thinking about how the command will have to change to accommodate each of the properties and then think about how the system will respond if you do this at runtime. High load, low load, 1 server, 20 servers. Offline / Online.

    I think I would like some more thought given to how this will work and how we ensure consistency and stability of the system.

    As it stands right now, the only properties that I would want to be able to change at runtime would be:

    • batch-size
    • batch-time-interval

    As these pertain to some form of tuning.

  2. When modifying gateway-event-filter and/or gateway-transport-filter, old list will be replaced with new list. Also to avoid any interference, dispatch thread would be paused in moment when changes are applied for sender on each member (similar as it is done for "load-balance gateway-sender" command).

    Regarding system response, since PR of first draft solution already exist, we will run some test.

    But I don't see any problems.

  3. Hi there Mario Ivanac,

    I hope that you can please look at this RFC again. There are sections that need to be looked at, as they do not contain not enough information or in some cases information that is not at all useful.

    The Problem section really needs to explain WHY something is to be done, WHAT is missing, or not working and HOW it impacts the system as it stands.

    The Solution section is ideally the section that is longer. It explains HOW the problem is to be solved. In this section, one would ideally want to describe changes and impacts. Class, Component or Sequence diagrams are often useful to be included here.

    In the case of the current RFC, things like runtime vs offline changes are to be discussed and how one or the other is recommended.

    Code that has been written already, is something that you can reference, to better explain or further the readers understanding of topics.

    In some of the newer RFCs, we've added use cases. Which helps the reader understand different scenarios that are to be handled and how to handle each use case. In short, think about different scenarios of success or failure. In many cases the failure cases are most useful, as these can then be coded in tests to make sure that we always know how the system should react in case of failures.

    Please amend the RFC with the following feedback.

    --Udo

  4. Hi there Mario Ivanac,

    Thank you for updating the RFC and more clearly describing the problem.

    With this RFC in mind, I would like to propose an addition.

    How about we introduce the notion of state into the GatewaySender. We currently already know of "RUNNING", would it not make sense to explicitly "PAUSE" the GatewaySender? Then one could almost safely make changes to some of the properties and then just "UNPAUSE" the GatewaySender?

    As the underlying `isPaused` delegates to the EventProcessor anyway. That way we can more easily (and with piece of mind), make changes to the configuration.

    I would love to also see some semblance of IMMUTABILITY within a component. So maybe, rather than just updating the values on an existing GatewaySender, we actually create a GatewaySender with immutable values.

    I would love to better understand what we can do here, rather than just changing values on a component. As I think we might have more success in this area if we start dealing with immutability rather than mutable.

    Dan Smith Anilkumar Gingade Barry Oglesby what do you think of this approach?

  5. Conceptually I like the idea of being able to alter these attributes.

    We already have an
    alter async-event-queue for async event queues. It seems reasonable to have a similar command for gateway senders as well. The description of alter async-event-queue says it "needs rolling restart for new attributes to take effect." So the proposed alter gateway-sender command has different behavior than alter async-event-queue. Maybe eventually we could make alter async-event-queue not need a rolling restart.

    Is this intended to support both serial and parallel queues? In the linked PR I only see tests for serial queues at the moment. We also should test how this works with different values of dispatcher-threads. Multiple dispatcher threads create multiple queues internally.

    My biggest concern is race conditions that may happen when these values are altered at runtime. Pausing the dispatcher during the update is a good start. But there are other race conditions to worry about. A while ago we added support for altering eviction and expiration attributes on partitioned regions. We ran into several issues when geode created buckets at the same time the user altered the region. The problem was that the bucket copied some attributes from the partitioned region. Altering a parallel sender might have similar issues when we create BucketRegionQueues. We need to do extensive testing for race conditions that might occur when users alter the sender while we create buckets.

    We also need to test what happens when there is data in the queue. The attribute the concerns me the most is group-transaction-events. Does that attribute affect how the queue is ordered?

  6. Is the intention for the user to be able to modify these settings through the Java API as well? I notice that the linked PR does change the public API by introducing the following method.

    GatewaySender {...  void update(Map<String, String> runTimeGatewaySenderAttributes,
          Map<String, List<CacheCallback>> runTimeGatewaySenderFilters);}


    That's not consistent with the rest of our public java API. The attributes should be strongly typed, not just a Map of strings. Maybe we should add individual setters for these fields on GatewaySender. Or is it important to change multiple fields at once? In that case we should have some sort of GatewaySenderAttributes or GatewaySenderMutator object that containers the setter methods.


  7. Thanks for comments. We did not intend to modify attributes through Java API. And we prefer to modify multiple attributes at once.

    Also PR will be updated according to your comments.

  8. I think this is a good idea in general, although I agree with Dan that there needs to be actual Java API for all of these setters instead of just one update method. And there are some challenges with changing some of these attributes.

    Looking at the code, these attributes should be ok to change on the fly since they are batch-based. One batch will have the previous setting; the next will have the new setting.

    - batch-size
    - batch-time-interval
    - alert-threshold

    The batch-size and the batch-time-interval are used by peek. The alert-threshold is used after the batch acknowledgement.

    The group-transaction-events is also used by peek, but it can be called multiple times in SerialGatewaySenderQueue so it could have different values for the same batch in the serial case. I'm not sure if thats a problem, but it needs to be checked out. Its only called once per batch in the parallel case. I guess if you're pausing the dispatcher, it should be ok.

    The GatewayEventFilters and GatewayTransportFilters are maintained in ArrayLists. This will have to change to allow concurrent access.

    Also, the same GatewayEventFilter(s) should be invoked for the same events in each member and to have the results of the callbacks be the same in each member. Otherwise, an event may be enqueued in one member but not another. I think the sender might have to be paused from accepting new events while the GatewayEventFilters are updated. This is not pausing the dispatcher but pausing the accepting of events (before the beforeEnqueue method is invoked).

    Also, a GatewayEventFilter is called multiple times for the same event, so it will have to be able to tolerate inconsistent data depending on what it is doing.

    We've had trouble with GatewayTransportFilters in the past. Often, they require matching filters on both sides of the WAN. I'm not sure how you'll handle that case if you're adding them on the fly. If they don't require matching pairs, then pausing the dispatcher and closing its existing connection should handle updating these.

    Also, GatewayEventFilters, GatewayTransportFilters and the alert-threshold are checked in the GatewaySenderAdvisor.checkCompatibility method when the sender is created to ensure it is compatible with remote senders. I'm pretty sure there isn't an update patch that calls this method so you don't have to worry about changing these values. But you do have to make sure all members that define the sender have the same values for these properties so when they are restarted, the sender is created successfully in all members.

  9. Thanks for the comments. I will update solution, and also I will remove GatewayTransportFilters from list of parameters which need to be changed, since we can go without this change.

  10. For us, it is acceptable to modify GatewayEventFilters, while gateway sender is paused. And current proposal should cover concurrent access to filters. I will cover this part with tests.

  11. PR is updated according to last comments, and additional test are added, to verify solution.