This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • KIP-360: Improve handling of unknown producer

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Include mention of INVALID_PRODUCER_ID_MAPPING error


  1. Enter the ABORTABLE_ERROR state. The only way to recover is for the user to abort the current transaction.
  2. Use the InitProducerId API to request an epoch bump using the current epoch.
  3. If another producer has already bumped the epoch, this will result in a fatal INVALID_PRODUCER_FENCED EPOCH error.
  4. If the epoch bump succeeds, the producer will reset sequence numbers back to 0 and continue after the next transaction begins.


  1. No epoch is provided: the current epoch will be bumped and the last epoch will be set to -1.
  2. Epoch is provided:
    1. Provided epoch matches current epoch: the last epoch will be set to the current epoch, and the current epoch will be bumped .
    2. Provided epoch matches last epoch: the current epoch will be returned
    3. Else: return INVALID_PRODUCER_EPOCH

Another case we want to handle is InvalidProducerIdMapping. This error occurs following expiration of the producerId. It's possible that another producerId has been installed in its place following expiration (if another producer instance has become active), or the mapping is empty. We can safely retry the InitProducerId with the logic in this KIP in order to detect which case it is:

  1. After receiving INVALID_PRODUCER_ID_MAPPING, the producer can send InitProducerId using the current producerId and epoch.
  2. If no mapping exists, the coordinator can generate a new producerId and return it. If a transaction is in progress on the client, it will have to be aborted, but the producer can continue afterwards.
  3. Otherwise if a different producerId has been assigned, then we can return INVALID_PRODUCER_EPOCH since that is effectively what has happened. This is intended to simplify error handling. The point is that there is a newer producer and it doesn't matter whether it has the same producer id or not.

Prolonged producer state retention: As proposed in KAFKA-7190, we will alter the behavior of the broker to retain the cached producer state even after it has been removed from the log. Previously we attempted to keep the producer state consistent with the the contents of the log so that we could rebuild it from the log if needed. However, it is rarely necessary to rebuild producer state, and it is more useful to retain the state we have as long as possible. Here we propose to remove it only when the transactional id expiration time has passed.