To be Reviewed By: September 6th, 2019

Authors: Alberto Gomez (alberto.gomez@est.tech)

Status: Draft | Discussion | Active | Dropped | Superseded

Superseded by: N/A

Related: N/A

Problem

The client API for function execution is inconsistent in the following aspects:

1.Timeout in function execution client API

The native API (C++ and C#) for function execution allows to set a timeout to wait for the execution of a function in the execute() methods.

The Java API for function execution does not allow to set a timeout to wait for the execution of a function. Nevertheless, there is a possibility to set a specific timeout by using a system property when starting the client process.

Setting this timeout in the Java API has the effect of setting a read timeout in the socket. If the timeout expires during the execution of a function before data has been received, the connection is closed and if the number of retries is reached, the execute method throws an exception.

Setting the timeout in the native APIs (C++ and C#) has the effect of also setting a read timeout in the socket.

Nevertheless, how this timeout is set is not uniform across the different clients:

  • In the native C++ and C# clients, the timeout can be set on a per function execution basis by passing the timeout to the Execution::execute() method.
  • In the Java API, the timeout can only be set globally by a system property when starting the client process.

2. Timeout in ResultCollector::getResult()

Apart from the timeout on the function execution, the client API offers the possibility of setting a timeout on the getResult() method of the ResultCollector object returned by the Execution::execute() method.

Given that this method is blocking when invoked from a client (until all results have been received) then the setting of this timeout has no effect at all. In fact, the DefaultResultCollector in the Java API just ignores the value of the timeout.

Note that this timeout in the ResultCollector::getResult() method is useful when used inside a peer as function invocations are not blocking.

3. Blocking  Execution::execute()

As mentioned above the Execution::execute() method behavior is different when invoked from clients than from peers. When invoked from clients it is blocking until all results are received while when invoked from peers it is non-blocking and the ResultCollector::getResult() method is used to wait to get all the results.

This is not explicit in the documentation of the interface and it has already been captured in the following ticket: https://issues.apache.org/jira/browse/GEODE-3817

Anti-Goals

-

Solution

In order to make the API more coherent two actions are proposed:

1. Read timeout in function execution Java client API

Add two new Execution::execute() methods in the Java client to offer the possibility to set the timeout for the socket just as it is done in the C++ and C# clients.

This action is simple to implement and there are no adverse effects attached to it.

2. Timeout in ResultCollector::getResult() and Execution::execute() blocking

Regarding the timeout in the ResultCollector::getResult() method problem and the blocking/non-blocking confusion for Execution::execute() two alternatives are considered:

a) Remove the possibility of setting a timeout on the ResultCollector::getResult() method on the client side as with the current client implementation it is useless. This could be done by removing the method with the timeout parameter from the public API.

It would be advisable to make explicit in the documentation that the getResult() method does not wait for results to arrive as that should have already been done in the Execution::execute() invocation.

This alternative is very simple and would keep things pretty much as they are today.

b) Transform the Execution::execute() method on the client side into a non-blocking method.

This alternative is more complex and requires changes in all the clients. Apart from that it has implications on the public client API it requires moving the exceptions offered currently by the Execution::execute() method to the ResultCollector::getResult() and new threads will have to be managed.

An outline of a possible implementation for option b) would be:

  • Instead of invoking the ServerRegionProxy::executeFunction() directly as it is done today, create a Future that invokes this method and returns the resultCollector passed as parameter.
  • Create a new class (ProxyResultCollector) that will hold a Future for a ResultCollector and whose getResult() methods implementation would be something like:
return this.future.get().getResult();
  • After creating the future that invokes ServerRegionFunction::executeFunction() create an instance of ProxyResultCollector and call the setFuture() method on it passing the just created future and return it.

Changes and Additions to Public Interfaces

Regarding the first action, the Java client API would grow by adding the following two methods:

  • ResultCollector<OUT, AGG> execute(String functionId, long timeout, TimeUnit unit) throws FunctionException;
  • ResultCollector<OUT, AGG> execute(Function function, long timeout, TimeUnit unit) throws FunctionException;

Depending on the option selected on the second action:

  • For option a) the methods to set a timeout on the ResultCollector::getResult() would be removed on the client side or it would be documented that the timeout has no effect
  • For option b) the client documentation would be updated to indicate that the Execute::execute() methods are non-blocking.

Performance Impact

If option b) from action 2 is selected, some decrease in the client's performance will be observed due to the use of new threads in order to implement the non-blocking version of execute.

Backwards Compatibility and Upgrade Path

Depends on the option selected.

If the timeout for the ResultCollector::getResult() is deprecated, this could be done by updating the documentation in which case there will be no impacts on clients.

If the Execution::execute() method is made non-blocking there could be impacts on clients that expected the method to be blocking. In this case, it may be necessary to offer a configuration setting to offer both behaviors (blocking/non-blocking) in the release where the blocking version is deprecated.

Prior Art

-

FAQ

-

Errata

The option selected after having some discussion on the list is option 1. with some additions:

  • Add two new methods to the Execute class in the Java API including the timeout:
    • execute(String functionId, long timeout, TimeUnit unit)
    • execute(Function function, long timeout, TimeUnit unit);
  • Document the blocking behavior of the current Execution::execute() methods without timeout in the Java API.
  • Implement the Execution::execute() methods with timeout in both the server and the client.

The following JIRA addresses the changes: GEODE-7049 - Getting issue details... STATUS



  • No labels