Status

Current state: Accepted

Discussion thread: https://lists.apache.org/thread/zg5jbs4ogqgv7d9qwzvb5rp5vd5y2soc

Vote thread: https://lists.apache.org/thread/s3xo06ow8xz1vsg71lwkjn04qbklny3w

JIRA: KAFKA-17057 - Getting issue details... STATUS

Released: 

Motivation

With KAFKA-16508 we changed the Kafka Streams behavior to call the ProductionExceptionHandler for a single special case of retriable TimeoutException  thrown for a potentially (we don't know yet, as metadata propagation is async) missing output topic, to break an infinite retry loop.

However, this seems not to be very flexible, as users might want to keep retrying, too.

Public Interfaces

Add a new return option RETRY to the existing ProductionExceptionHandlerResponse :

public interface ProductionExceptionHandler extends Configurable {

    enum ProductionExceptionHandlerResponse {
        // existing options

        /* continue processing */
        CONTINUE(0, "CONTINUE"),
        /* fail processing */
        FAIL(1, "FAIL"),

       // newly added option

       /* retry the operation -- might imply throwing a TaskCorruptedException and retrying from the last committed offset;
          only valid to return this option if the passed in exception is a RetriableException;
          if returned for a non-retriable exception, it will be interpreted as FAIL */
       RETRY(2, "RETRY");
}

Proposed Changes

We propose to add a new option ProductionExceptionHandlerResponse.RETRY that a production exception handler can return for RetriableException. If this option is returned for a non-retriable exception, it will be interpreted as FAIL.

We further propose to update the logic of the existing (and default) DefaultProductionHandler to check for retriable exceptions and return RETRY instead of FAIL. While we consider the change of KAFKA-16508 - Getting issue details... STATUS as bug-fix, updating the exiting handler preserves backward compatibility, and seems to provide a better default behavior.

Compatibility, Deprecation, and Migration Plan

We only add a new return option, and thus no backward compatibility concerns arise.

Test Plan

Regular unit and integration testing is sufficient.

Documentation Plan

Update relevant JavaDocs and the web page docs.

Rejected Alternatives

We propose to interpret RETRY as FAIL for non-retriable exception. An alternative would be, to add a new method to ProductionExceptionHandler that we call for retriable errors only, and add a RetriableResponse enum and offer the new RETRY option only on the new enum (and the newly added method of ProductionExcetiponHandler returns the new response enum). While this option might express semantics a little bit stricter, it seems overkill to expand the API surface area, and the proposed interpretation of RETRY as FAIL seems sounds.

  • No labels