Date: Tue, 19 Mar 2024 09:54:02 +0000 (UTC) Message-ID: <642320511.56043.1710842042274@cwiki-he-fi.apache.org> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_56042_711521281.1710842042274" ------=_Part_56042_711521281.1710842042274 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Current state: Under Discussion
Discussion thread: here [Change the link from the KIP proposal email archive=
to your own email thread]
JIRA: KAFKA-1489
=
Please keep the discussion on the mailing list rather than commenting on= the wiki (wiki discussions get unwieldy fast).
This KIP is related to KAFKA-1489: Global threshold on data retention si= ze.
In dynamic situations where topics are added in unpredictable ways, the = existing log retention parameters are not sufficient to prevent out-of-disk= conditions from occurring. Consider, for example, a Kafka cluster that ser= ves the needs of a team of developers working on new Kafka-based services. = Such users may begin pushing arbitrary amounts of content into new topics a= t any time.
The existing log retention parameters reflect a frame of reference in wh= ich topic usage is predictable, and an administrator can make reasonable pr= ojections and choose configuration values based on provisioned disk capacit= y.
The alternative frame of reference reflected in this proposal includes t= he following assumptions and priorities:
We propose adding a new broker configuration option:
val LogRetentionDiskUsagePercentProp =3D "log.retention.disk.usage.per= cent"
val LogRetentionDiskUsagePercentDoc =3D "The maximum percentage of= disk space allowed to be in use
(per-disk). Deletes oldest seg= ments (across all topics) to maintain this usage ceiling."
The default value is 100, which effectively disables the feature.
Add a log retention parameter that sets a soft upper limit on the percen= tage of disk space that can be in use. After the other log retention polici= es have been applied in each cleanup cycle, compute the amount of disk spac= e in use on each physical device. For each device that is over the configur= ed usage limit, compute the number of "excess bytes" to free and delete as = many of the globally oldest segments as is necessary to reach this goal.
There are no known migration issues.
The feature is disabled by default. If enabled, it may supersede the follow=
ing time-based log retention parameters (just like the existing size-based =
parameters):
The parameter could be expressed as "minimum percentage of disk space to= keep free". It is expressed instead as "maximum percentage of disk space t= o allow to be used" because that seems more congruent with the other size-b= ased log retention parameters, which are expressed as maximums.
The parameter could be expressed as a global maximum byte count. Ops mon= itoring tools are typically configured to trigger alerts based on disk usag= e percentages, however. Adopting the same units makes configuration slightl= y easier. It also allows for disk capacity to be increased without the need= for reconfiguration, in some cases.
The parameter is defined as "percentage of disk space in use", not "perc= entage of disk space in use by Kafka". The latter definition would= be somewhat more expensive to compute each cycle. Perhaps more importantly= , it would weaken the guarantee that the parameter is designed to provide -= that the disks will never ever run out of free space.