DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
| ID | IEP-64 |
| Author | |
| Sponsor | |
| Created |
|
| Status | DRAFT |
Infrastructure developed in IEP-35 introduces a convenient way to work with metrics. All metrics are registered in a centralized registry, where names, descriptions and types are stored.
It lacks a way to specify units of measurement for the metrics though. All metrics are provided as raw numbers, and it's up to the user to interpret these numbers. The lack of units specified in a standardized form make this interpretation problematic.
We hope that developers specify units in the description or comments, but not everybody does that. Currently in a general case you need to go to the code where the metric is implemented and see how metrics are calculated to determine their units.
Different metrics of the same kind use different units. For example, buckets in tx.nodeSystemTimeHistogram use milliseconds, while cache.<cacheName>.GetTime uses nanoseconds for some reason.
When setting up a monitoring infrastructure, users need this information, since it makes a big difference whether the metric uses milliseconds or nanoseconds, bytes or number of packets. Monitoring tools could use this information to scale the measurements properly, converting bytes into megabytes, milliseconds into minutes, etc. There is no source of this information currently, so setting up a monitoring dashboard required searching through the source code of Ignite.
MetricUnit class should be introduced. It should be possible to acquire the following instances:
io.dataregion.PagesReplaced tx.totalNodeSystemTime cache.RebalanceStartTime cache.TotalRebalancedBytes sys.CpuLoad It should also be possible to represent units like "bytes per second" and "number of times per second". A method MetricUnit.per(...) should be implemented that will create an instance of a new MetricUnit corresponding to the ratio.
Example metrics: io.dataregion.PagesReplaceRate, cache.RebalancingBytesRate.
// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.
// Links to discussions on the devlist, if applicable.
// Links to various reference documents, if applicable.
// Links or report with relevant JIRA tickets.