Metrics to Monitor in Kafka and Zookeeper using JMX Exporter (2024)

In this article, we will explore the critical metrics essential for monitoring Apache Kafka effectively. Understanding and tracking these key metrics are crucial for ensuring the performance, reliability, and scalability of your Kafka clusters in real-time data processing environments.

Table of Contents

What is Apache Kafka?

Metrics to Monitor in Kafka and Zookeeper using JMX Exporter (1)

Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming applications. It’s like a highly efficient and scalable messaging system that can handle large volumes of data in real-time.

Apache Kafka Architecture

Metrics to Monitor in Kafka and Zookeeper using JMX Exporter (2)

Let’s break down the components and their interaction using Zomato, a food delivery app, as an example:

Producers:

  • In Kafka, producers are processes or applications that publish streams of data (records) to Kafka topics.
  • In Zomato’s case, various services could act as producers:
    • An order placement service might publish a stream of records whenever a new order is created. This record could include details like customer ID, restaurant ID, and order items.
    • A real-time location service might publish updates on the location of delivery personnel.

Brokers:

  • Kafka brokers are servers that store the published streams of records. They act as the central nervous system of the Kafka architecture.
  • Zomato would likely run a cluster of Kafka brokers to handle the high volume of data generated by its various services.

Topics:

  • Topics are categories or feeds in Kafka where related records are grouped. A topic can have multiple partitions (shards) for scalability.
  • Zomato could have topics for different purposes:
    • A topic named “order_events” might hold all the order placement records.
    • Another topic named “delivery_updates” might hold location updates for delivery personnel.

Consumers:

  • Consumers are processes or applications that subscribe to topics of interest and consume the published streams of records.
  • In Zomato’s scenario, various consumer applications might be subscribed to relevant topics:
    • A service managing order deliveries might subscribe to the “order_events” topic to receive notifications about new orders and assign them to delivery personnel.
    • A real-time tracking dashboard might subscribe to the “delivery_updates” topic to display the live location of delivery personnel.

Zookeeper:

  • While not explicitly shown in the image, Kafka often uses Zookeeper, a distributed coordination service, for tasks like leader election (choosing a replica broker to handle reads/writes for a partition) and maintaining cluster configuration.
  • In Zomato’s case, Zookeeper would ensure coordination among the Kafka brokers in the cluster.

Important Metrics to Monitor in Kafka

A few metrics are super important to have:

  • Number of active controllers: should always be 1

Metric-kafka_controller_kafkacontroller_activecontrollercount

Metrics to Monitor in Kafka and Zookeeper using JMX Exporter (3)
  • Number of underreplicated partitions: should always be 0

Metric-kafka_cluster_partition_underreplicated

Metrics to Monitor in Kafka and Zookeeper using JMX Exporter (4)
  • Number of offline partitions: should always be 0

Metric-kafka_controller_kafkacontroller_offlinepartitionscount

Metrics to Monitor in Kafka and Zookeeper using JMX Exporter (5)

Apache Kafka Metrics

Kafka metrics can be broken down into three categories:

  1. Kafka server (broker) metrics
  2. Kafka Producer metrics
  3. Kafka Consumer metrics
  4. Zookeeper metrics
  5. JVM Metrics

1.Broker Metrics

Monitoring and alerting on issues as they emerge in your broker cluster is critical since all messages must pass through a Kafka broker to be consumed.

Key Broker Metrics:

  • Topic Activity: Track the volume of messages being produced and consumed across different topics. This helps identify popular topics, potential bottlenecks, and overall cluster load.
  • Broker Performance: Monitor key broker metrics like CPU, memory usage, and network I/O. This allows you to identify overloaded brokers and potential resource constraints.
  • Replication: Ensure data integrity and redundancy by monitoring replication metrics. These metrics track the flow of data copies between replicas and identify any replication lags or failures.
  • Consumer Groups: Gain insights into consumer group behavior. Monitor metrics like consumer offsets and lag to ensure consumers are actively processing messages and identify any lagging consumers.
  • Errors: Quickly identify and troubleshoot issues by monitoring error metrics. These metrics track errors like produce request failures, fetch request failures, and invalid message formats.
NameDescription
UnderReplicatedPartitionsThe number of under-replicated partitions across all topics on the broker. Under-replicated partition metrics are a leading indicator of one or more brokers being unavailable.
IsrShrinksPerSec/IsrExpandsPerSecIf a broker goes down, in-sync replica ISRs for some of the partitions shrink. When that broker is up again, ISRs are expanded once the replicas are fully caught up.
ActiveControllerCountIndicates whether the broker is active and should always be equal to 1 since there is only one broker at the same time that acts as a controller.
OfflinePartitionsCountThe number of partitions that don’t have an active leader and are hence not writable or readable. A non-zero value indicates that brokers are not available.
LeaderElectionRateAndTimeMsA partition leader election happens when ZooKeeper is not able to connect with the leader. This metric may indicate a broker is unavailable.
UncleanLeaderElectionsPerSecA leader may be chosen from out-of-sync replicas if the broker which is the leader of the partition is unavailable and a new leader needs to be elected. This metric can indicate potential message loss.
TotalTimeMsThe time is taken to process the message.
PurgatorySizeThe size of purgatory requests. Can help identify the main causes of the delay.
BytesInPerSec/BytesOutPerSecThe number of data brokers received from producers and the number that consumers read from brokers. This is an indicator of the overall throughput or workload in the Kafka cluster.
RequestsPerSecondFrequency of requests from producers, consumers, and subscribers.

2.Producer Metrics

Producer metrics provide valuable insights into the behavior and performance of applications sending messages to your Kafka cluster.

Key Producer Metrics:

  • Message Production Rate: The number of messages produced per second by the producer application. This helps gauge the overall message volume being sent to Kafka.
  • Batch Size: The average size of message batches sent by the producer. Larger batches can improve throughput, but finding the optimal size depends on factors like topic replication and network latency.
  • Delivery Rate: The rate at which messages are successfully delivered to Kafka brokers. This metric helps identify any bottlenecks or delays in the message production pipeline.
  • Latency: The time it takes for a message to be sent from the producer to the Kafka broker. Analyzing latency can reveal potential issues like network congestion or overloaded brokers.
  • Producer Errors: Track errors encountered by the producer, such as produce request failures or serialization errors. Identifying these errors can help diagnose and fix issues with the producer application.
NameDescription
compression-rate-avgAverage compression rate of sent batches.
response-rateAn average number of responses received per producer.
request-rateAn average number of responses sent per producer.
request-latency-avgAverage request latency in milliseconds.
outgoing-byte-rateAn average number of outgoing bytes per second.
io-wait-time-ns-avgThe average length of time the I/O thread spent waiting for a socket (in ns).
batch-size-avgThe average number of bytes sent per partition per request.

3.Consumer Metrics

Consumer metrics are crucial for understanding how efficiently your applications are processing messages from Kafka topics.

Consumer metrics offer a window into various aspects of your Kafka consumers, including:

  • Consumption Rate: Track the number of messages a consumer is processing per second. This helps gauge overall processing efficiency and identify consumers that might be falling behind.
  • Fetch Behavior: Monitor metrics like fetch size and frequency to understand how consumers are requesting data from brokers. This can reveal potential inefficiencies in data fetching strategies.
  • Offsets: Track consumer offsets to determine their progress within a topic partition. Offsets indicate the last message a consumer has successfully processed. Lagging offsets could signal slow processing or consumer failures.
  • Commit Intervals: Monitor how often consumers commit their offsets to Kafka. Frequent commits ensure timely processing updates but can introduce additional overhead. Conversely, infrequent commits might lead to data loss during consumer failures.
  • Errors: Identify and diagnose issues related to message consumption. Consumer error metrics might reveal problems like invalid messages, network errors, or timeouts.
NameDescription
records-lagThe number of messages consumer is behind the producer on this partition.
records-lag-maxMaximum record lag. Increasing value means that the consumer is not keeping up with the producers.
bytes-consumed-rateAverage bytes consumed per second for each consumer for a specific topic or across all topics.
records-consumed-rateAn average number of records consumed per second for a specific topic or across all topics.
fetch-rateThe number of fetch requests per second from the consumer.

4.Zookeeper metrics

ZooKeeper, the crucial distributed coordination service for many Kafka deployments, also offers a rich set of metrics to monitor its health and performance.

Categories of ZooKeeper metrics:

  • Cluster State: Monitor metrics like the number of active servers, followers, and observers in your ZooKeeper ensemble. This ensures quorum health and identifies potential issues like server outages or connectivity problems.
  • Request Processing: Track metrics like the number of requests per second (reads, writes), request latencies, and failed requests. This helps identify overloaded servers or potential bottlenecks within ZooKeeper.
  • Watcher Performance: Watchers are a core ZooKeeper feature for notifications on data changes. Monitor metrics like the number of watchers and average watch event latency to ensure efficient change notification mechanisms.
  • Synchronization: ZooKeeper uses synchronization primitives like locks. Track metrics like lock acquisition times and contention rates to identify potential synchronization bottlenecks in your applications.
NameDescription
outstanding-requestsThe number of requests that are in the queue.
avg-latencyThe response time to a client request is in milliseconds.
num-alive-connectionsThe number of clients connected to ZooKeeper.
followersThe number of active followers.
pending-syncsThe number of pending consumers syncs.
open-file-descriptor-countThe number of used file descriptors.

5.JVM Metrics

While Kafka itself provides valuable metrics, the underlying JVM (Java Virtual Machine) offers another crucial layer of monitoring for your Kafka deployment. JVM metrics expose insights into the health and performance of the Java environment running your Kafka.

  • Memory Usage: Track metrics like heap memory usage, non-heap memory usage, and garbage collection activity. This helps ensure sufficient memory allocation and identify potential memory leaks or excessive garbage collection overhead impacting Kafka’s performance.
  • Threading: Monitor metrics like thread count, CPU usage by threads, and thread pool utilization. This helps identify potential thread starvation or overloaded thread pools, ensuring efficient resource allocation for Kafka tasks.
  • Class Loading: Track metrics like the number of loaded classes and class loading times. This helps identify issues with classpath configuration or excessive class loading impacting application startup times.
  • File Descriptors: Monitor the number of open file descriptors to identify potential resource exhaustion and ensure proper file descriptor management within the Kafka brokers.

JVM garbage collector metrics

NameDescription
CollectionCountThe total number of young or old garbage collection processes performed by the JVM.
CollectionTimeThe total amount of time in milliseconds that the JVM spent executing young or old garbage collection processes.

Host metrics

NameDescription
Page cache reads ratioThe ratio of the number of reads from the cache pages and the number of reads from the disk.
Disk usageThe amount of used and available disk space.
CPU usageThe CPU is rarely the source of performance issues. However, if you see spikes in CPU usage, this metric should be investigated.
Network bytes sent/receivedThe amount of incoming and outgoing network traffic.

Prometheus provides Kafka metrics file using jmx_exporter in below official prometheus jmx_exporter official GitHub repository. For this setup, we’ll use the kafka-2_0_0.yml sample configuration.

lowercaseOutputName: true

rules:
# Special cases and very specific rules
- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>Value
name: kafka_server_$1_$2
type: GAUGE
labels:
clientId: "$3"
topic: "$4"
partition: "$5"
- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>Value
name: kafka_server_$1_$2
type: GAUGE
labels:
clientId: "$3"
broker: "$4:$5"
- pattern : kafka.coordinator.(\w+)<type=(.+), name=(.+)><>Value
name: kafka_coordinator_$1_$2_$3
type: GAUGE

# Generic per-second counters with 0-2 key/value pairs
- pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+), (.+)=(.+)><>Count
name: kafka_$1_$2_$3_total
type: COUNTER
labels:
"$4": "$5"
"$6": "$7"
- pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+)><>Count
name: kafka_$1_$2_$3_total
type: COUNTER
labels:
"$4": "$5"
- pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
name: kafka_$1_$2_$3_total
type: COUNTER

# Quota specific rules
- pattern: kafka.server<type=(.+), user=(.+), client-id=(.+)><>([a-z-]+)
name: kafka_server_quota_$4
type: GAUGE
labels:
resource: "$1"
user: "$2"
clientId: "$3"
- pattern: kafka.server<type=(.+), client-id=(.+)><>([a-z-]+)
name: kafka_server_quota_$3
type: GAUGE
labels:
resource: "$1"
clientId: "$2"
- pattern: kafka.server<type=(.+), user=(.+)><>([a-z-]+)
name: kafka_server_quota_$3
type: GAUGE
labels:
resource: "$1"
user: "$2"

# Generic gauges with 0-2 key/value pairs
- pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Value
name: kafka_$1_$2_$3
type: GAUGE
labels:
"$4": "$5"
"$6": "$7"
- pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Value
name: kafka_$1_$2_$3
type: GAUGE
labels:
"$4": "$5"
- pattern: kafka.(\w+)<type=(.+), name=(.+)><>Value
name: kafka_$1_$2_$3
type: GAUGE

# Emulate Prometheus 'Summary' metrics for the exported 'Histogram's.
#
# Note that these are missing the '_sum' metric!
- pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Count
name: kafka_$1_$2_$3_count
type: COUNTER
labels:
"$4": "$5"
"$6": "$7"
- pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*), (.+)=(.+)><>(\d+)thPercentile
name: kafka_$1_$2_$3
type: GAUGE
labels:
"$4": "$5"
"$6": "$7"
quantile: "0.$8"
- pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Count
name: kafka_$1_$2_$3_count
type: COUNTER
labels:
"$4": "$5"
- pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*)><>(\d+)thPercentile
name: kafka_$1_$2_$3
type: GAUGE
labels:
"$4": "$5"
quantile: "0.$6"
- pattern: kafka.(\w+)<type=(.+), name=(.+)><>Count
name: kafka_$1_$2_$3_count
type: COUNTER
- pattern: kafka.(\w+)<type=(.+), name=(.+)><>(\d+)thPercentile
name: kafka_$1_$2_$3
type: GAUGE
labels:
quantile: "0.$4"

# Generic gauges for MeanRate Percent
# Ex) kafka.server<type=KafkaRequestHandlerPool, name=RequestHandlerAvgIdlePercent><>MeanRate
- pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>MeanRate
name: kafka_$1_$2_$3_percent
type: GAUGE
- pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>Value
name: kafka_$1_$2_$3_percent
type: GAUGE
- pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*, (.+)=(.+)><>Value
name: kafka_$1_$2_$3_percent
type: GAUGE
labels:
"$4": "$5"

Conclusion:

In conclusion, monitoring Apache Kafka involves tracking essential metrics across brokers, producers, consumers, and ZooKeeper, ensuring optimal performance and reliability in real-time data processing environments. By focusing on these key metrics, organizations can proactively manage Kafka clusters and maintain high availability for their streaming applications.

Reference:-

For reference visit theofficial website.

Any queries pls contact us @Fosstechnix.com.

Related Articles:

Install Apache Kafka and Zookeeper on Ubuntu 24.04 LTS

Metrics to Monitor in Kafka and Zookeeper using JMX Exporter (2024)

FAQs

How do you prepare to monitor Kafka with JMX metrics? ›

Configure JMX
  1. server.hostname Specifies the host a JMX client connects to. The default is localhost ( 127.0.0.1 )
  2. jmxremote=true Enables the JMX remote agent and enables the connector to listen through a specific port.
  3. jmxremote. authenticate=false Indicates that authentication is off by default.
  4. jmxremote.

What are the metrics of Kafka monitoring? ›

Summary of key Kafka monitoring concepts

Key metrics include message throughput, broker resource utilization, consumer lag, and latency. Collecting and analyzing metrics is essential for identifying and troubleshooting issues, optimizing performance, and meeting SLOs and SLAs.

What is the difference between Kafka exporter and JMX exporter? ›

If you are unfamiliar with them, JMX Exporter gives you the metrics of each individual broker, such as memory, GC and Kafka external metrics ( kafkajmx. * in Wavefront), while Kafka Exporter gives you the metrics of the overall state in the cluster, such as the offsets of partitions ( kafka. * in Wavefront).

Which metric should you monitor on a Kafka producer to determine how many acknowledgments it is receiving per second? ›

Metric to watch: Response rate

For producers, the response rate represents the rate of responses received from brokers. Brokers respond to producers when the data has been received.

Which metrics can you monitor with a JMX extension? ›

JMX metrics are available for all Java-based processes monitored by OneAgent. Once your extension is uploaded, Dynatrace automatically begins querying the defined metrics for all Java processes. To find the metrics, go to a relevant process page and click Further details.

How to check JMX metrics? ›

Open the JMX panel to view the metrics.
  1. Click Connect in the New Connection dialog. The JMX panel opens.
  2. Open the MBeans tab and expand com. genesyslab. gemc. metrics. All of the Web Engagement metrics are there.
  3. To refresh the metrics, click Refresh.
Oct 31, 2019

What are the key performance indicators of Kafka? ›

The four main metrics Kafka provides are the Kafka server (broker) metrics, Producer metrics, Consumer metrics, and ZooKeeper metrics. These metrics assist in monitoring Kafka and resolving issues before they become more serious. In this article, we will explore the Importance of Monitoring Kafka Performance.

What are metrics in monitoring? ›

An effective monitoring system collects data, aggregates it, stores it, visualizes metrics, and alerts you about any problems in your systems. Metrics are the basic values used to understand historical trends, compare various factors, identify patterns and anomalies, and find errors and problems.

What are the best monitoring tools for Apache Kafka? ›

Summary of popular Kafka monitoring tools
ToolKey strength
Prometheus with Kafka ExporterExcellent metric visualization and querying capabilities.
BurrowSpecializes in monitoring Kafka consumer lag.
Confluent Control CenterComprehensive cluster management and monitoring.
2 more rows

How does JMX exporter work? ›

JMX Exporter uses Java's JMX mechanism to read the monitoring data of the JVM runtime, and then converts it into a metrics format that can be recognized by Prometheus, so that Prometheus can monitor and collect it. The parameters are specified when the JVM starts, and the RMI interface of JMX is exposed.

What is the port of JMX exporter in Kafka? ›

When JMX exporter is enabled, JMX port in kafka container is set to 5555 and jmx-exporter sidecar use it to collect the metrics and expose them in port 5556. Kafka commands use the same env var ( JMX_PORT ), so they will try to open the port configured for the server.

What is the use of Kafka exporter? ›

Kafka Exporter is an open source project to enhance monitoring of Apache Kafka brokers and clients. Kafka Exporter is provided with AMQ Streams for deployment with a Kafka cluster to extract additional metrics data from Kafka brokers related to offsets, consumer groups, consumer lag, and topics.

How do I check Kafka metrics? ›

To monitor Kafka metrics use Grafana dashboards. First, you need to choose the type of dashboard that suits you and create it. Then choose a data source. Today the best source of data for Grafana is Graphite.

How many messages can Kafka handle per second? ›

Kafka generally has better performance. If you are looking for more throughput, Kafka can go up to around 1,000,000 messages per second, whereas the throughput for RabbitMQ is around 4K-10K messages per second. This is due to the architecture, as Kafka was designed around throughput.

How to do performance testing for Kafka using JMeter? ›

Below are the steps to set up JMeter and Kafka in Windows.
  1. Step 1: Install JMeter. To set up JMeter on your system, visit the Apache JMeter website to download the latest binary file. ...
  2. Step 2: Install and Configure Kafka. ...
  3. Step 3: Creating Test Plan with JMeter for Kafka Testing. ...
  4. Step 4: Run the Load Test.
Mar 20, 2024

How to check if JMX is enabled in Kafka? ›

The JMX feature is enabled in the connector by default. To disable JMX, set the jmx property to false . Snowpipe supports the Kafka connector version 1.6. 0 and later.

How to enable JMX in Kafka Connect? ›

JMX is enabled for Kafka by default. You can set the following JVM environment variables to configure JMX monitoring for your Docker image in a Compose file, a Dockerfile, or from the command line when you run Kafka.

References

Top Articles
Latest Posts
Article information

Author: Greg Kuvalis

Last Updated:

Views: 5889

Rating: 4.4 / 5 (75 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Greg Kuvalis

Birthday: 1996-12-20

Address: 53157 Trantow Inlet, Townemouth, FL 92564-0267

Phone: +68218650356656

Job: IT Representative

Hobby: Knitting, Amateur radio, Skiing, Running, Mountain biking, Slacklining, Electronics

Introduction: My name is Greg Kuvalis, I am a witty, spotless, beautiful, charming, delightful, thankful, beautiful person who loves writing and wants to share my knowledge and understanding with you.