The following features related to Tiered Storage(KIP-405) are available in the GA release:
- Remote Storage at Cluster and Topic Level: To leverage the remote tier archival and retrieval capabilities, you need to enable remote storage at both at the cluster level and the specific topic level. By enabling remote storage at the cluster level, you activate the feature for the entire Kafka cluster. Subsequently, by enabling remote storage for a particular topic, you designate it for archival and retrieval through remote storage.
- Seamless Client Compatibility: With Tiered Storage enabled for a topic, no changes are required in Kafka clients to read from that topic.
- Non-intrusive Integration: We have ensured that the tiered storage feature does not cause any disruptions to other Kafka functionalities when it is disabled at a cluster level. Your existing Kafka setup will continue to function smoothly, as the default configuration for "remote.log.storage.system.enable" is set to false, ensuring the tiered storage feature remains inactive.
- Monitoring for Remote Storage feature: This feature incorporates several new metrics to help you keep track of tiered storage operations effectively. These metrics are available exclusively when the Remote Storage feature is enabled for your cluster.
- Quotas: With the tiered storage quota feature, users can define a maximum limit on the rate at which log segments are transferred to or retrieved from the remote storage.
- LocalTieredStorage Implementation: As part of the default implementation, we introduce 'LocalTieredStorage,' a local file-based RemoteStorageManager. LocalTieredStorage facilitates the simulation of remote storage behavior in a controlled and isolated environment during testing.
- Cluster Upgrades: Clusters upgrading from any previous version to version 3.9.0 can enable Tiered Storage for topics created after the upgrade to 3.6.0. Topics created on or after version 2.8.0 are also eligible for Tiered Storage. However, for topics created before version 2.8.0, Tiered Storage cannot be enabled after upgrading the cluster to version 3.6.0. To utilize Tiered Storage for older topics, manual steps are required. Specifically, older segments need to be deleted before Tiered Storage can be activated on these topics. Our code does not automatically block this process, leaving the responsibility in the hands of the user.
- Enabling and Disabling Tiered Storage for a Cluster: We provide the option to enable or disable Tiered Storage for an entire Kafka cluster in both ZK and KRaft modes. Additionally, you can enable or disable Tiered Storage for individual topics, but this feature is only available in KRaft mode. The ability to manage Tiered Storage at the topic level is not supported in ZK mode.
- Data Deletion upon Topic Deletion: When you delete a topic that is utilizing Tiered Storage, the data associated with that topic will be automatically deleted from remote storage.
- JBOD support: Tiered storage feature also works in clusters configured with multiple log directories (i.e. JBOD feature).
- Integration with MirrorMaker 2: MM2 will still work as before when tiered storage in source topic is enabled. That is, when source topic has tiered storage enabled, the whole logs including remote storage will be mirrored into the target cluster. About the remote log metadata stored in "__remote_log_metadata" topic when using default TopicBasedRemoteLogMetadataManager, the topic will be excluded in MM2 by default.
While the GA release of Tiered Storage offers the opportunity to try out this feature, it is important to be aware of the following limitations:
- Compacted Topics: Currently, tiered storage is not available for compacted topics. If you attempt to enable remote storage on a compacted topic, you will receive a configuration exception. Also, if this topic WAS a compacted topic, and later updated as a non-compacted topic. In this case, enabling remote storage will not throw configuration exceptions. However, this is still not supported because we assume the topics are not compacted.
Client Compatibility: All Kafka clients, regardless of their version, can continue to produce and consume records from topics utilizing Tiered Storage. However, clients with versions prior to 3.0 are limited in performing administrative actions, such as enabling Tiered Storage on a topic (ex: they might change directly in ZK using
--zookeeper
option). To successfully enable Tiered Storage for a topic, clients must be running Kafka version 3.0 or later, as administrative actions related to Tiered Storage are only supported on clients from version 3.0 onwards.- Log segments without producerSnapshot file: When there are log segments missing producer snapshot file while the transaction or idempotent producer is used, the remote log Manager will break a contract with the RSM API and would result in failure. Missing snapshot files can happen when topic is created before v2.8.0.
For the latest information regarding known issues, their resolutions, and possible workarounds, please visit the parent tracker for next release of Tiered Storage feature at - KAFKA-16947Getting issue details... STATUS . We are committed to addressing these issues and providing a reliable Tiered Storage experience, and your feedback is incredibly valuable in helping us improve the feature.