Authors: Greg Harris, Ivan Yurchenko, Jorge Quilcate, Giuseppe Lillo, Anatolii Popov, Juha Mynttinen, Josep Prat, Filip Yonov

Status

Current state: Voting

Discussion thread: here

Vote thread: here

JIRA: KAFKA-19161

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).


Motivation

Background

The Apache Kafka protocol has become a successful base for building streaming applications, and has attracted workloads that push the Apache Kafka implementation to new limits. The Apache Kafka implementation is designed around low-durability block storage and direct replication, and provides strong consistency and high durability backed by commodity hardware.

Currently, Apache Kafka is often operated in cloud hyperscaler environments where high-reliability object storage is available and more cost-effective than block storage for equivalent workloads. The existing Tiered Storage feature (KIP-405) provides the capability to use object storage for inactive segments, and has seen widespread adoption. However, Tiered Storage does not remove the need for replication of active segments, which is the most substantial infrastructure cost for Apache Kafka operators on hyperscalers today.

Two out of the 3 major clouds charge for cross-availability-zone traffic:

Even in the case where network traffic is not accounted; we believe that the operational benefits of Diskless topics are still appealing to Kafka users, for example:

  • Cluster scalability increases when less data is required to be stored on disks and rebalanced between brokers.
  • Object storages normally have better durability than local disks.

Multiple protocol-compatible alternatives to Apache Kafka now use object storage to fully replace direct replication and substantially lower the cost to operate a cluster on a hyperscaler cloud. These alternatives are finding market success and their adoption is rising, showing a general market interest in this optimization.

Motivating Question

Should the Apache Kafka implementation pursue an object storage optimization as is present in all alternatives?

Yes, The Apache Kafka reference implementation should incorporate this innovation, and provide the capability to replace block storage with object storage.

New Capabilities

Diskless Topics allow Apache Kafka operators on hyperscalers to:

  • Eliminate inter-zone data transfer costs from replication
  • Eliminate inter-zone ingress and egress costs for data from producers and to consumers

Diskless topics will allow all Apache Kafka operators to:

  • Write through to object storage, avoiding using local disks for durable storage
  • Pick pluggable commodity storage backends based on their environment
  • Tradeoff cost optimization and latency on a per-topic basis

By incorporating this feature within Apache Kafka specifically:

  • More operators will have access to this feature under the Apache 2.0 license
  • The community can maintain the feature, reducing dependence on vendors
  • Protocol changes for further optimizations, such as producer rack-awareness, become possible
  • The Apache Kafka implementation can maintain or grow its market share, and avoid obsolescence

With Diskless Topics, Apache Kafka will become a streaming engine that supports a wide spectrum of latencies, balancing cost and performance for an extremely diverse set of workloads.

Disk usage and lack thereof

It's important to clarify what exactly "diskless" means. "Diskless" primarily refers to not using broker disks as the primary durable storage of user data. However, diskless topics still require some broker disk usage, particularly:

  • normal topic KRaft metadata
  • batch metadata may be stored on broker disk depending on the batch coordinator implementation (e.g. in a Kafka topic)
  • diskless topic user data may be stored on disk while being copied to tiered storage
  • diskless topic user data may be cached on disk to be served to consumers

It is also worth mentioning that Diskless topics are not meant to change the Kafka Storage API, but to have a separate request processing that takes care of the access to remote storage.

In short, Diskless is to “No Disks” as Serverless is to “No Servers,” the attached disks become a less important abstraction for operators but are still functionally present.

Proposed Changes

This KIP will not require any changes to the codebase or documentation upon acceptance. By accepting this KIP, we will come to a consensus on the need for this feature, and its end-user requirements, but not any specific implementation details.

For details on the planned implementation, please see the integral follow-up KIPs:

Each of these KIPs will have its own discussion and voting. Effort should be focused on this KIP first, and only after the community has generally agreed this KIP is something we want, should the particular implementation be designed. These KIPs will influence one another, and together they constitute the minimum viable form of this feature.

These KIPs will be aligned with the values of the Kafka community, and propose a long-lasting and extensible design that composes well with existing functionality. This will involve both substantial re-use of existing code, and refactoring in order to ensure that this feature does not need substantial rework in the future.

Further Work

In addition to the minimum viable implementation described in the integral KIPs above, below are some optional follow-ups. These are features which are not critical to the core functionality, but are natural extensions, further optimizations, and new innovations which are unlocked once the core functionality is in place.

  • Topic Type Changing: Allow classic topics to be changed into Diskless and vice versa.
  • Broker Roles: Specializing brokers between produce/consume/coordination/compaction operations and permitting heterogeneous Kafka clusters
  • Parallel Produce Handling: Processing multiple Produce requests concurrently, increasing potential throughput in high latency environments.
  • Iceberg Format: Allowing massively parallel processing of at-rest topic data. This work enables a pluggable storage interface where one can innovate in the log format layer independently
  • Dynamically Enabled Diskless: Allowing extremely easy migrations to try out & revert Diskless
  • Multi-region active-active topics with automatic failover

These components are less defined, and currently don’t have KIPs attached. Contributions are welcome to either suggest other extensions, or design one of the above extensions. Design, discussion, and voting on these is expected to begin after the integral KIPs are complete.

Public Interfaces

This KIP does not propose any new public interfaces; but its sub-KIPs will.

Compatibility, Deprecation, and Migration Plan

This will be a backwards-compatible upgrade for existing Kafka Clusters. Diskless Topics will also support all existing APIs with the same external semantics as non-Diskless topics, including:

  • Ordering
  • Idempotentcy
  • Transactions
  • Consumer Groups/Offsets
  • Queues/Share Groups
  • Tiered Storage

Broadly, Diskless topics are intended to be semantically interchangeable with non-Diskless topics, while enabling latency and cost tradeoffs.

Specific compatibility, deprecation, and migration details will be covered in follow-up KIPs.

Test Plan

There will be no modifications to existing tests for this KIP. Follow-up KIPs will have specific test plans.

Documentation Plan

There will be no modifications to the documentation for this KIP. Follow-up KIPs will have specific plans for changes to documentation.

Rejected Alternatives

Drop support for non-Diskless topics

The current topic implementation is still appropriate for low latency use-cases, and Diskless topics are not always a suitable replacement. It would also be a backwards-incompatible change, and one which would limit the ability of this new feature to be rolled out.

Additionally, all existing functionality in Kafka depends on the existing topic implementation, including the KRaft Metadata, Consumer Offsets & Group Coordination, Transactions, and Share Groups. Dropping support for non-Diskless topics would require additional scope to integrate this existing functionality.

Enable Diskless on a per-cluster basis instead of per-topic

For users, a Cluster is an administrative boundary, one with a unified resource namespace, permissions system, and physical deployment. Users within one administrative boundary may have distinct performance requirements, and wish to choose different underlying storage parameters. This mirrors the existing topic configurability (retention, segment rolling, etc.)

Resolve some but not all rack transfer costs

Currently there are external techniques for avoiding transfer costs with non-Diskless topics, such as single-rack topics. However, these force users to make durability, availability, semantic, usability, and other application-specific compromises. If Diskless topics do not eliminate all rack transfer costs, users will still need to make these compromises with Diskless topics. By eliminating these transfer costs with internal design changes, we can offer a better user experience overall.

Additionally, if there is pressure to eliminate costs broadly, a solution which only partially resolves these costs may be shortly replaced with one that does, duplicating effort in the community. By striving for a holistic solution, we can make best use of the Kafka community's limited resources.

Do Nothing

As time progresses, this will become the single most substantial missing feature from the upstream implementation. This will drive high-scale and cloud users to Apache Kafka alternatives, which will grow in total market share. This will further fragment the control that Apache Kafka has over the Kafka Protocol, and Kafka may lose its mandate over the Kafka protocol entirely. This may lead to needing to coordinate with forks for new functionality, proliferation of hard protocol forks, or re-centralization of the protocol under a standards organization. We should take steps now to avoid or delay this outcome.

  • No labels