Kafka runs on JVM, but no JVM exporter in  kafka-ecosystems. i wrote one on Spring boot for working and very happy to share.

Status

Current stateUnder Discussion

Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]

JIRA: here [Change the link from KAFKA-1 to your own ticket]

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

kafka is an excellent MQ/Data Pipeline running on JVM,  but no exporters JVMly. for a better future of  Kafka-Ecosystems
the Apache needs a formal exporter like https://github.com/apache/rocketmq-exporter.
i wrote one for working, and hope to give to Apache. there are a lot of metric in JMX, it can be configed in the exporter-config.

Public Interfaces

How to config an Exporter?

common config

server:
  port: 5650
spring:
  application:
    name: kafka-exporter
  profiles:
    active: dev
  http:
    encoding:
      charset: UTF-8
      enabled: true
      force: true
logging:
  config: classpath:logback.xml

task:
  count: 8
  brokerTopicMetrics:
    cron: 1/15 * * * * ?
  lagMetrics:
    cron: 1/15 * * * * ?
  jvmMetrics:
    cron: 1/15 * * * * ?
  replicaMetrics:
    cron: 1/15 * * * 12 ?
  networkMetrics:
    cron: 1/15 * * * * ?
  logFlushMetrics:
    cron: 1/15 * * * * ?
  kafkaControllerMetrics:
    cron: 1/15 * * * 12 ?
  kafkaClusterMetrics:
    cron: 1/15 * * * 12 ?
kafka-exporter:
  kafka-versions.0.10.2.0: 1 ## different kafka versions using different api versions
  kafka-versions.0.10.1.1: 1 ## different kafka versions using different api versions
  kafka-versions.1.0.0: 1
  canSendToPaladin: true
  ## allowCollectMetrics and forbidCollectMetricNames for this yml's task config
  allowCollectMetrics.brokerTopicMetrics:
    - kafka.server:type=BrokerTopicMetrics,name=*
    - kafka.server:type=BrokerTopicMetrics,name=*,topic=*
  forbidCollectMetricNames.brokerTopicMetrics:
    - FetchMessageConversionsPerSec
  allowCollectMetrics.jvmMetrics:
    - java.lang:type=GarbageCollector,name=*
    - java.lang:type=Threading
  forbidCollectMetricNames.jvmMetrics:
    - Code Cache
  allowCollectMetrics.replicaMetrics:
    - kafka.server:type=ReplicaManager,name=*
  forbidCollectMetricNames.replicaMetrics:
    - aa
  allowCollectMetrics.networkMetrics:
    - kafka.network:type=RequestMetrics,name=*,request=*
    - kafka.network:type=RequestMetrics,name=*,request=*,version=* # for 2.0.0
    - kafka.network:type=SocketServer,name=*
    - kafka.network:type=RequestChannel,name=*
    - kafka.server:type=KafkaRequestHandlerPool,name=*
  forbidCollectMetricNames.networkMetrics:
    - MessageConversionsTimeMs # normally , use metric name
    - TemporaryMemoryBytes
    - MessageConversionsTimeMs
    - ThrottleTimeMs
    - TotalTimeMs
    - LocalTimeMs
    - RemoteTimeMs
    - RequestBytes
    - ResponseQueueTimeMs
    - ResponseSendTimeMs
  forbidCollectMetricNames.RequestMetrics:
    - AlterConfigs
    - AlterReplicaLogDirs
    - ApiVersions
    - ControlledShutdown
    - CreateAcls
    - CreateDelegationToken
    - DeleteAcls
    - DeleteRecords
    - DescribeAcls
    - DescribeConfigs
    - DescribeDelegationToken
    - DescribeLogDirs
    - EndTxn
    - ExpireDelegationToken
    - InitProducerId
    - OffsetForLeaderEpoch
    - RenewDelegationToken
    - SaslAuthenticate
    - SaslHandshake
    - StopReplica
    - TxnOffsetCommit
    - WriteTxnMarkers
    - AddOffsetsToTxn
  allowCollectMetrics.logFlushMetrics:
    - kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs
    - kafka.log:type=LogCleanerManager,name=*
  forbidCollectMetricNames.logFlushMetrics:
    - aa
  allowCollectMetrics.kafkaControllerMetrics:
    - kafka.controller:type=KafkaController,name=*
  forbidCollectMetricNames.kafkaControllerMetrics:
    - aa
  allowCollectMetrics.kafkaClusterMetrics:
    - kafka.cluster:type=Partition,name=*,topic=*,partition=*
  forbidCollectMetricNames.kafkaClusterMetrics:
    - aa
  jmx-excludes-metrics.brokerTopicMetrics:
    - aa
  jmx-excludes-attrs.BrokerTopicMetrics:
    - aa
  jmx-excludes-attrs-global:
    - EventType
    - RateUnit
    - LatencyUnit
    - 50thPercentile
    - 75thPercentile
    - 98thPercentile
    - LastGcInfo
    - MemoryPoolNames
    - ObjectName
    - Valid
    - Name
    - ThreadAllocatedMemoryEnabled
    - ThreadAllocatedMemorySupported
    - ThreadContentionMonitoringEnabled
    - AllThreadIds
    - ThreadCpuTimeSupported
    - ThreadCpuTimeEnabled
    - ThreadContentionMonitoringSupported
    - CurrentThreadCpuTimeSupported
    - ObjectMonitorUsageSupported
    - SynchronizerUsageSupported

kafka clusters' you want to monitor

kafka-exporter:
  zookeepers:
    - cluster-name: cluster-name-of-your-kafka-brokers ##cluster name
      zk-ip-and-port: 127.0.0.1:2181,127.0.0.2:2181 ## zookeeper addresses
      zk-kafka-path: /kafka ## zookeeper namespace
      excludes-topics.BrokerTopicMetrics:
        - aaa
        - bbb
        - beexiao(.*?)
      jmx-excludes-metrics.BrokerTopicMetrics:
        - aa
        - bb
      jmx-excludes-metrics.RequestMetrics:
        - AlterConfigs
        - AlterReplicaLogDirs
        - ApiVersions
        - ControlledShutdown
        - CreateAcls
        - CreateDelegationToken
        - DeleteAcls
        - DeleteRecords
        - DescribeAcls
        - DescribeConfigs
        - DescribeDelegationToken
        - DescribeLogDirs
        - EndTxn
        - ExpireDelegationToken
        - InitProducerId
        - OffsetForLeaderEpoch
        - RenewDelegationToken
        - SaslAuthenticate
        - SaslHandshake
        - StopReplica
        - TxnOffsetCommit
        - WriteTxnMarkers
        - AddOffsetsToTxn
      jmx-excludes-attrs.BrokerTopicMetrics:
        - EventType
        - RateUnit
      jmx-excludes-attrs.GarbageCollector:
        - LastGcInfo
        - MemoryPoolNames
        - ObjectName
        - Valid
        - Name
      jmx-excludes-attrs.ReplicaManager:
        - EventType
        - RateUnit
      jmx-excludes-attrs.RequestMetrics:
        - EventType
        - RateUnit
        - FifteenMinuteRate
        - FiveMinuteRate
        - 75thPercentile
        - 98thPercentile
      jmx-excludes-attrs.LogFlushRateAndTimeMs:
        - LatencyUnit
        - RateUnit
        - EventType
        - FifteenMinuteRate
        - 50thPercentile
        - 75thPercentile
        - 98thPercentile

metric names for now

kafka_BrokerTopicMetrics_BytesInPerSec_Count
kafka_BrokerTopicMetrics_BytesInPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_BytesInPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_BytesInPerSec_MeanRate
kafka_BrokerTopicMetrics_BytesInPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_BytesOutPerSec_Count
kafka_BrokerTopicMetrics_BytesOutPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_BytesOutPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_BytesOutPerSec_MeanRate
kafka_BrokerTopicMetrics_BytesOutPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_BytesRejectedPerSec_Count
kafka_BrokerTopicMetrics_BytesRejectedPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_BytesRejectedPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_BytesRejectedPerSec_MeanRate
kafka_BrokerTopicMetrics_BytesRejectedPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_FailedFetchRequestsPerSec_Count
kafka_BrokerTopicMetrics_FailedFetchRequestsPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_FailedFetchRequestsPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_FailedFetchRequestsPerSec_MeanRate
kafka_BrokerTopicMetrics_FailedFetchRequestsPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_FailedProduceRequestsPerSec_Count
kafka_BrokerTopicMetrics_FailedProduceRequestsPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_FailedProduceRequestsPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_FailedProduceRequestsPerSec_MeanRate
kafka_BrokerTopicMetrics_FailedProduceRequestsPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_MessagesInPerSec_Count
kafka_BrokerTopicMetrics_MessagesInPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_MessagesInPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_MessagesInPerSec_MeanRate
kafka_BrokerTopicMetrics_MessagesInPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_ProduceMessageConversionsPerSec_Count
kafka_BrokerTopicMetrics_ProduceMessageConversionsPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_ProduceMessageConversionsPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_ProduceMessageConversionsPerSec_MeanRate
kafka_BrokerTopicMetrics_ProduceMessageConversionsPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_ReplicationBytesInPerSec_Count
kafka_BrokerTopicMetrics_ReplicationBytesInPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_ReplicationBytesInPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_ReplicationBytesInPerSec_MeanRate
kafka_BrokerTopicMetrics_ReplicationBytesInPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_ReplicationBytesOutPerSec_Count
kafka_BrokerTopicMetrics_ReplicationBytesOutPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_ReplicationBytesOutPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_ReplicationBytesOutPerSec_MeanRate
kafka_BrokerTopicMetrics_ReplicationBytesOutPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_TotalFetchRequestsPerSec_Count
kafka_BrokerTopicMetrics_TotalFetchRequestsPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_TotalFetchRequestsPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_TotalFetchRequestsPerSec_MeanRate
kafka_BrokerTopicMetrics_TotalFetchRequestsPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_TotalProduceRequestsPerSec_Count
kafka_BrokerTopicMetrics_TotalProduceRequestsPerSec_FifteenMinuteRate
kafka_BrokerTopicMetrics_TotalProduceRequestsPerSec_FiveMinuteRate
kafka_BrokerTopicMetrics_TotalProduceRequestsPerSec_MeanRate
kafka_BrokerTopicMetrics_TotalProduceRequestsPerSec_OneMinuteRate
kafka_GarbageCollector_G1_Old_Generation_CollectionCount
kafka_GarbageCollector_G1_Old_Generation_CollectionTime
kafka_GarbageCollector_G1_Young_Generation_CollectionCount
kafka_GarbageCollector_G1_Young_Generation_CollectionTime
kafka_KafkaController_ActiveControllerCount_Value
kafka_KafkaController_ControllerState_Value
kafka_KafkaController_GlobalPartitionCount_Value
kafka_KafkaController_GlobalTopicCount_Value
kafka_KafkaController_OfflinePartitionsCount_Value
kafka_KafkaController_PreferredReplicaImbalanceCount_Value
kafka_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent_Count
kafka_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent_FifteenMinuteRate
kafka_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent_FiveMinuteRate
kafka_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent_MeanRate
kafka_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent_OneMinuteRate
kafka_LogCleanerManager_max_dirty_percent_Value
kafka_LogCleanerManager_time_since_last_run_ms_Value
kafka_LogFlushStats_LogFlushRateAndTimeMs_95thPercentile
kafka_LogFlushStats_LogFlushRateAndTimeMs_999thPercentile
kafka_LogFlushStats_LogFlushRateAndTimeMs_99thPercentile
kafka_LogFlushStats_LogFlushRateAndTimeMs_Count
kafka_LogFlushStats_LogFlushRateAndTimeMs_FifteenMinuteRate
kafka_LogFlushStats_LogFlushRateAndTimeMs_FiveMinuteRate
kafka_LogFlushStats_LogFlushRateAndTimeMs_Max
kafka_LogFlushStats_LogFlushRateAndTimeMs_Mean
kafka_LogFlushStats_LogFlushRateAndTimeMs_MeanRate
kafka_LogFlushStats_LogFlushRateAndTimeMs_Min
kafka_LogFlushStats_LogFlushRateAndTimeMs_OneMinuteRate
kafka_LogFlushStats_LogFlushRateAndTimeMs_StdDev
kafka_Partition_InSyncReplicasCount_Value
kafka_Partition_LastStableOffsetLag_Value
kafka_Partition_ReplicasCount_Value
kafka_Partition_UnderMinIsr_Value
kafka_Partition_UnderReplicated_Value
kafka_ReplicaManager_FailedIsrUpdatesPerSec_Count
kafka_ReplicaManager_FailedIsrUpdatesPerSec_FifteenMinuteRate
kafka_ReplicaManager_FailedIsrUpdatesPerSec_FiveMinuteRate
kafka_ReplicaManager_FailedIsrUpdatesPerSec_MeanRate
kafka_ReplicaManager_FailedIsrUpdatesPerSec_OneMinuteRate
kafka_ReplicaManager_IsrExpandsPerSec_Count
kafka_ReplicaManager_IsrExpandsPerSec_FifteenMinuteRate
kafka_ReplicaManager_IsrExpandsPerSec_FiveMinuteRate
kafka_ReplicaManager_IsrExpandsPerSec_MeanRate
kafka_ReplicaManager_IsrExpandsPerSec_OneMinuteRate
kafka_ReplicaManager_IsrShrinksPerSec_Count
kafka_ReplicaManager_IsrShrinksPerSec_FifteenMinuteRate
kafka_ReplicaManager_IsrShrinksPerSec_FiveMinuteRate
kafka_ReplicaManager_IsrShrinksPerSec_MeanRate
kafka_ReplicaManager_IsrShrinksPerSec_OneMinuteRate
kafka_ReplicaManager_LeaderCount_Value
kafka_ReplicaManager_OfflineReplicaCount_Value
kafka_ReplicaManager_PartitionCount_Value
kafka_ReplicaManager_UnderMinIsrPartitionCount_Value
kafka_ReplicaManager_UnderReplicatedPartitions_Value
kafka_RequestChannel_RequestQueueSize_Value
kafka_RequestChannel_ResponseQueueSize_Value
kafka_RequestMetrics_RequestQueueTimeMs_95thPercentile
kafka_RequestMetrics_RequestQueueTimeMs_999thPercentile
kafka_RequestMetrics_RequestQueueTimeMs_99thPercentile
kafka_RequestMetrics_RequestQueueTimeMs_Count
kafka_RequestMetrics_RequestQueueTimeMs_Max
kafka_RequestMetrics_RequestQueueTimeMs_Mean
kafka_RequestMetrics_RequestQueueTimeMs_Min
kafka_RequestMetrics_RequestQueueTimeMs_StdDev
kafka_RequestMetrics_RequestsPerSec_Count
kafka_RequestMetrics_RequestsPerSec_FifteenMinuteRate
kafka_RequestMetrics_RequestsPerSec_FiveMinuteRate
kafka_RequestMetrics_RequestsPerSec_MeanRate
kafka_RequestMetrics_RequestsPerSec_OneMinuteRate
kafka_SocketServer_MemoryPoolAvailable_Value
kafka_SocketServer_MemoryPoolUsed_Value
kafka_SocketServer_NetworkProcessorAvgIdlePercent_Value
kafka_Threading_CurrentThreadCpuTime
kafka_Threading_CurrentThreadUserTime
kafka_Threading_DaemonThreadCount
kafka_Threading_PeakThreadCount
kafka_Threading_ThreadCount
kafka_Threading_TotalStartedThreadCount
kafka_consumer_lag
kafka_topic_partitions


Proposed Changes

build a whole new kafka-exporter RUN-ON-JVM for kafka

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?Answer: we can monitor our kafka clusters more easily using prometheus exporter in java, and many many metrics you want.


  • If we are changing behavior how will we phase out the older behavior?Answer: prometheus is a very good monitor for midwares like kafka, or maybe your ops has already use it.


  • If we need special migration tools, describe them here.Answer: some prometheus servers, and prometheus alert manager


  • When will we remove the existing behavior?Answer: all exporters run stable and you can view all metrics in some UI(like grafana)

Rejected Alternatives

let's do this!

  • No labels