Status
Current state: Accepted
Discussion thread: https://lists.apache.org/thread/zg5jbs4ogqgv7d9qwzvb5rp5vd5y2soc
Vote thread: https://lists.apache.org/thread/s3xo06ow8xz1vsg71lwkjn04qbklny3w
JIRA: - KAFKA-17057Getting issue details... STATUS
Released:
Motivation
With KAFKA-16508 we changed the Kafka Streams behavior to call the ProductionExceptionHandler
for a single special case of retriable TimeoutException
thrown for a potentially (we don't know yet, as metadata propagation is async) missing output topic, to break an infinite retry loop.
However, this seems not to be very flexible, as users might want to keep retrying, too.
Public Interfaces
Add a new return option RETRY
to the existing ProductionExceptionHandlerResponse
:
public interface ProductionExceptionHandler extends Configurable { enum ProductionExceptionHandlerResponse { // existing options /* continue processing */ CONTINUE(0, "CONTINUE"), /* fail processing */ FAIL(1, "FAIL"), // newly added option /* retry the operation -- might imply throwing a TaskCorruptedException and retrying from the last committed offset; only valid to return this option if the passed in exception is a RetriableException; if returned for a non-retriable exception, it will be interpreted as FAIL */ RETRY(2, "RETRY"); }
Proposed Changes
We propose to add a new option ProductionExceptionHandlerResponse.RETRY
that a production exception handler can return for RetriableException
. If this option is returned for a non-retriable exception, it will be interpreted as FAIL
.
We further propose to update the logic of the existing (and default) DefaultProductionHandler
to check for retriable exceptions and return RETRY
instead of FAIL
. While we consider the change of
-
KAFKA-16508Getting issue details...
STATUS
as bug-fix, updating the exiting handler preserves backward compatibility, and seems to provide a better default behavior.
Compatibility, Deprecation, and Migration Plan
We only add a new return option, and thus no backward compatibility concerns arise.
Test Plan
Regular unit and integration testing is sufficient.
Documentation Plan
Update relevant JavaDocs and the web page docs.
Rejected Alternatives
We propose to interpret RETRY as FAIL for non-retriable exception. An alternative would be, to add a new method to ProductionExceptionHandler
that we call for retriable errors only, and add a RetriableResponse
enum and offer the new RETRY option only on the new enum (and the newly added method of ProductionExcetiponHandler
returns the new response enum). While this option might express semantics a little bit stricter, it seems overkill to expand the API surface area, and the proposed interpretation of RETRY as FAIL seems sounds.