Status

Current state: Accepted

Discussion thread: here 

JIRA: KAFKA-17251 

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Since KAFKA-10716, Kafka Streams persists process identity metadata on disk to keep a stable process identity across restarts, which helps stabilize task assignment.
Kafka Streams also uses an application directory lock file to prevent multiple processes from using the same local state directory concurrently.


Today, KafkaStreams#cleanUp  deletes local state under state.dir  and then attempts to delete the application directory itself as the final step.
However, this directory deletion can fail even when the cleanup succeeded, because the directory may still contain expected metadata files:

  • process identity metadata file: kafka-streams-process-metadata

  • application directory lock file: .lock

When the directory deletion fails, the resulting warning logs can be misleading and can cause confusion for users and operators who interpret the warning as a failure to clean local state. 
In addition, the current JavaDoc wording can be interpreted as requiring complete removal of all local artifacts for the application, including these metadata files, which is not aligned with the design goal of preserving stable process identity across restarts.

This KIP updates the public contract of KafkaStreams#cleanUp  to explicitly preserve the process identity metadata and the lock file, and to clarify that cleanUp clears local state but does not guarantee deletion, and may retain the application directory when expected metadata files remain


Public Interfaces

  • org.apache.kafka.streams.KafkaStreams#cleanUp() 

This KIP updates the public contract and JavaDoc of KafkaStreams#cleanUp . There are no API signature changes.

The contract is updated to clarify the final application directory deletion step for state.dir/application.id and the user-visible logging behavior when expected metadata files remain (.lock and kafka-streams-process-metadata).
In this case, cleanup is not considered a failure and no WARN is emitted. No other public interfaces are changed.


Proposed Changes

KafkaStreams.cleanUp behavior is unchanged unless explicitly stated below.
This KIP only changes the final application directory deletion step and the associated logging when expected metadata files remain.

At the end of the cleanup process, KafkaStreams#cleanUp  attempts to delete the application state directory state.dir/application.id, subject to the following rules:

  • If the directory is empty, it is deleted.
  • If the directory contains only the expected metadata files below, the directory is retained and no WARN is emitted:
    • kafka-streams-process-metadata 
    • .lock
  • If any other files remain, the directory is retained and a WARN is logged indicating unexpected files prevented complete cleanup.
  • If the directory contents cannot be determined (for example, listing returns null), the directory is retained and a WARN is logged.



Compatibility, Deprecation, and Migration Plan

This change is backward compatible in terms of API surface, as there are no signature changes.

User-visible behavior changes are limited to the final application directory deletion step and logging:

  • Previously, KafkaStreams#cleanUp  could emit a WARN when state.dir/application.id could not be deleted because it was not empty, even if the remaining entries were expected metadata files.
  • With this change, when the only remaining entries are kafka-streams-process-metadata and and/or .lock, KafkaStreams#cleanUp does not emit a WARN. This situation is treated as a successful cleanup of local state.


Potential impact:

  • Users and operators who have alerts or log-based monitors on the previous WARN message may need to adjust those monitors, as the WARN will no longer be emitted for the expected-metadata-only case.
  • Users who require a full local reset including process identity should explicitly remove state.dir/application.id (or delete kafka-streams-process-metadata) as part of their operational procedure, since cleanUp may retain these metadata files.

Note that this is not a new operational requirement. Even prior to this change, KafkaStreams#cleanUp could fail to delete state.dir/application.id when expected metadata files (for example, kafka-streams-process-metadata and or .lock) remained. Therefore, users who require a full local reset including process identity should explicitly remove state.dir/application.id (or delete kafka-streams-process-metadata) as part of their operational procedure.


Deprecation

No deprecations are introduced.


Migration

No Migration is required.


Test Plan

Add and or extend unit tests to validate the final directory deletion decision and logging:

  • Case 1: Remaining entries are only kafka-streams-process-metadata
    • Verify that cleanUp retains state.dir/application.id and does not emit a WARN.
  • Case 2: Remaining entries are only .lock
    • Verify that cleanUp retains state.dir/application.id and does not emit a WARN.
  • Case 3: Remaining entries are only kafka-streams-process-metadata and .lock
    • Verify that cleanUp retains state.dir/application.id and does not emit a WARN.
  • Case 4: Remaining entries include an unexpected file
    • Verify that cleanUp retains state.dir/application.id and emits a WARN indicating unexpected files remain.


Documentation Plan

  • KafkaStreams#cleanUp JavaDoc
    • Clarify that the final application directory deletion step may retain state.dir/application.id when expected metadata files remain.
    • Clarify that this situation is not considered a cleanup failure and does not emit a WARN.
  • Streams application reset tool documentation (e.g. https://kafka.apache.org/41/streams/developer-guide/app-reset-tool/)
    • Align the described reset procedure with the updated cleanUp semantics, clarifying that cleanUp may retain the application directory due to expected metadata files.


Rejected Alternatives

The discussion considered three options for KafkaStreams#cleanUp  semantics:

  • Delete the process identity metadata file during cleanUp before deleting state.dir/application.id
    • Summary: Remove kafka-streams-process-metadata  so the application directory can be fully deleted.
    • Rejected because it resets process identity on the next start, which breaks the process identity persistence introduced by KAFKA-10716 and can increase task movement and rebalances after restart. It is also an observable behavior change for users who rely on stable process identity.
  • Add a parameter (or API variant) to KafkaStreams#cleanUp  to choose between preserving metadata and fully deleting the application directory
    • Summary: Make the semantics explicit by allowing users to select a full delete mode versus preserve-metadata mode.
    • Rejected for this KIP to keep the API simple and the change minimal. Introducing a new public parameter or overload expands the API surface and increases user complexity. This can be considered as a follow-up KIP if a strong use case emerges.


Appendix: Expected filesystem state

Before cleanUp (example):

  • state.dir/application.id/ 

    • kafka-streams-process-metadata 

    • .lock 

    • 0_0/ 

      • rocksdb stores, checkpoint files, etc 

    • 0_1/ 

    • global/ 

After cleanUp:

  • state.dir/application.id/ 

    • kafka-streams-process-metadata 

    • .lock 

  • No labels