6. Operations - 6.8 Monitoring - 《Apache Kafka v3.0 Documentation》

All Kafka rate metrics have a corresponding cumulative count metric with suffix . For example, records-consumed-rate has a corresponding metric named records-consumed-total.

The easiest way to see the available metrics is to fire up jconsole and point it at a running kafka client or server; this will allow browsing all metrics with JMX.

Security Considerations for Remote Monitoring using JMX

Apache Kafka disables remote JMX by default. You can enable remote monitoring using JMX by setting the environment variable JMX_PORT for processes started using the CLI or standard Java system properties to enable remote JMX programmatically. You must enable security when enabling remote JMX in production scenarios to ensure that unauthorized users cannot monitor or control your broker or application as well as the platform on which these are running. Note that authentication is disabled for JMX by default in Kafka and security configs must be overridden for production deployments by setting the environment variable KAFKA_JMX_OPTS for processes started using the CLI or by setting appropriate Java system properties. See for details on securing JMX.

We do graphing and alerting on the following metrics:

The following metrics are available on producer/consumer/connector/streams instances. For specific metrics, please see following sections.

Metric/Attribute name	Description	Mbean name
connection-close-rate	Connections closed per second in the window.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
connection-close-total	Total connections closed in the window.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
connection-creation-rate	New connections established per second in the window.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
connection-creation-total	Total new connections established in the window.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
network-io-rate	The average number of network operations (reads or writes) on all connections per second.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
network-io-total	The total number of network operations (reads or writes) on all connections.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
outgoing-byte-rate	The average number of outgoing bytes sent per second to all servers.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
outgoing-byte-total	The total number of outgoing bytes sent to all servers.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
request-rate	The average number of requests sent per second.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
request-total	The total number of requests sent.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
request-size-avg	The average size of all requests in the window.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
request-size-max	The maximum size of any request sent in the window.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
incoming-byte-rate	Bytes/second read off all sockets.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
incoming-byte-total	Total bytes read off all sockets.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
response-rate	Responses received per second.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
response-total	Total responses received.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
select-rate	Number of times the I/O layer checked for new I/O to perform per second.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
select-total	Total number of times the I/O layer checked for new I/O to perform.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
io-wait-time-ns-avg	The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
io-wait-ratio	The fraction of time the I/O thread spent waiting.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
io-time-ns-avg	The average length of time for I/O per select call in nanoseconds.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
io-ratio	The fraction of time the I/O thread spent doing I/O.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
connection-count	The current number of active connections.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
successful-authentication-rate	Connections per second that were successfully authenticated using SASL or SSL.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
successful-authentication-total	Total connections that were successfully authenticated using SASL or SSL.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
failed-authentication-rate	Connections per second that failed authentication.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
failed-authentication-total	Total connections that failed authentication.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
successful-reauthentication-rate	Connections per second that were successfully re-authenticated using SASL.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
successful-reauthentication-total	Total connections that were successfully re-authenticated using SASL.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
reauthentication-latency-max	The maximum latency in ms observed due to re-authentication.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
reauthentication-latency-avg	The average latency in ms observed due to re-authentication.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
failed-reauthentication-rate	Connections per second that failed re-authentication.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
failed-reauthentication-total	Total connections that failed re-authentication.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
successful-authentication-no-reauth-total	Total connections that were successfully authenticated by older, pre-2.2.0 SASL clients that do not support re-authentication. May only be non-zero.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)

The following metrics are available on producer/consumer/connector/streams instances. For specific metrics, please see following sections.

Metric/Attribute name	Description	Mbean name
outgoing-byte-rate	The average number of outgoing bytes sent per second for a node.	kafka.[producer\|consumer\|connect]:type=[consumer\|producer\|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
outgoing-byte-total	The total number of outgoing bytes sent for a node.	kafka.[producer\|consumer\|connect]:type=[consumer\|producer\|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
request-rate	The average number of requests sent per second for a node.	kafka.[producer\|consumer\|connect]:type=[consumer\|producer\|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
request-total	The total number of requests sent for a node.	kafka.[producer\|consumer\|connect]:type=[consumer\|producer\|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
request-size-avg	The average size of all requests in the window for a node.	kafka.[producer\|consumer\|connect]:type=[consumer\|producer\|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
request-size-max	The maximum size of any request sent in the window for a node.	kafka.[producer\|consumer\|connect]:type=[consumer\|producer\|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
incoming-byte-rate	The average number of bytes received per second for a node.	kafka.[producer\|consumer\|connect]:type=[consumer\|producer\|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
incoming-byte-total	The total number of bytes received for a node.	kafka.[producer\|consumer\|connect]:type=[consumer\|producer\|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
request-latency-avg	The average request latency in ms for a node.	kafka.[producer\|consumer\|connect]:type=[consumer\|producer\|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
request-latency-max	The maximum request latency in ms for a node.	kafka.[producer\|consumer\|connect]:type=[consumer\|producer\|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
response-rate	Responses received per second for a node.	kafka.[producer\|consumer\|connect]:type=[consumer\|producer\|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
response-total	Total responses received for a node.	kafka.[producer\|consumer\|connect]:type=[consumer\|producer\|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)

The following metrics are available on producer instances.

Metric/Attribute name	Description	Mbean name
waiting-threads	The number of user threads blocked waiting for buffer memory to enqueue their records.	kafka.producer:type=producer-metrics,client-id=([-.\w]+)
buffer-total-bytes	The maximum amount of buffer memory the client can use (whether or not it is currently used).	kafka.producer:type=producer-metrics,client-id=([-.\w]+)
buffer-available-bytes	The total amount of buffer memory that is not being used (either unallocated or in the free list).	kafka.producer:type=producer-metrics,client-id=([-.\w]+)
bufferpool-wait-time	The fraction of time an appender waits for space allocation.	kafka.producer:type=producer-metrics,client-id=([-.\w]+)

	Attribute name	Description
kafka.producer:type=producer-metrics,client-id=”{client-id}”
	batch-size-avg	The average number of bytes sent per partition per-request.
	batch-size-max	The max number of bytes sent per partition per-request.
	batch-split-rate	The average number of batch splits per second
	batch-split-total	The total number of batch splits
	compression-rate-avg	The average compression rate of record batches, defined as the average ratio of the compressed batch size over the uncompressed size.
	metadata-age	The age in seconds of the current producer metadata being used.
	produce-throttle-time-avg	The average time in ms a request was throttled by a broker
	produce-throttle-time-max	The maximum time in ms a request was throttled by a broker
	record-error-rate	The average per-second number of record sends that resulted in errors
	record-error-total	The total number of record sends that resulted in errors
	record-queue-time-avg	The average time in ms record batches spent in the send buffer.
	record-queue-time-max	The maximum time in ms record batches spent in the send buffer.
	record-retry-rate	The average per-second number of retried record sends
	record-retry-total	The total number of retried record sends
	record-send-rate	The average number of records sent per second.
	record-send-total	The total number of records sent.
	record-size-avg	The average record size
	record-size-max	The maximum record size
	records-per-request-avg	The average number of records per request.
	request-latency-avg	The average request latency in ms
	request-latency-max	The maximum request latency in ms
	requests-in-flight	The current number of in-flight requests awaiting a response.
kafka.producer:type=producer-topic-metrics,client-id=”{client-id}”,topic=”{topic}”
	Attribute name	Description
	byte-rate	The average number of bytes sent per second for a topic.
	byte-total	The total number of bytes sent for a topic.
	compression-rate	The average compression rate of record batches for a topic, defined as the average ratio of the compressed batch size over the uncompressed size.
	record-error-rate	The average per-second number of record sends that resulted in errors for a topic
	record-error-total	The total number of record sends that resulted in errors for a topic
	record-retry-rate	The average per-second number of retried record sends for a topic
	record-retry-total	The total number of retried record sends for a topic
	record-send-rate	The average number of records sent per second for a topic.
	record-send-total	The total number of records sent for a topic.

Metric/Attribute name	Description	Mbean name
commit-latency-avg	The average time taken for a commit request	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
commit-latency-max	The max time taken for a commit request	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
commit-rate	The number of commit calls per second	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
commit-total	The total number of commit calls	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
assigned-partitions	The number of partitions currently assigned to this consumer	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
heartbeat-response-time-max	The max time taken to receive a response to a heartbeat request	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
heartbeat-rate	The average number of heartbeats per second	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
heartbeat-total	The total number of heartbeats	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
join-time-avg	The average time taken for a group rejoin	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
join-time-max	The max time taken for a group rejoin	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
join-rate	The number of group joins per second	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
join-total	The total number of group joins	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
sync-time-avg	The average time taken for a group sync	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
sync-time-max	The max time taken for a group sync	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
sync-rate	The number of group syncs per second	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
sync-total	The total number of group syncs	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
rebalance-latency-avg	The average time taken for a group rebalance	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
rebalance-latency-max	The max time taken for a group rebalance	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
rebalance-latency-total	The total time taken for group rebalances so far	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
rebalance-total	The total number of group rebalances participated	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
rebalance-rate-per-hour	The number of group rebalance participated per hour	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
failed-rebalance-total	The total number of failed group rebalances	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
failed-rebalance-rate-per-hour	The number of failed group rebalance event per hour	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
last-rebalance-seconds-ago	The number of seconds since the last rebalance event	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
last-heartbeat-seconds-ago	The number of seconds since the last controller heartbeat	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
partitions-revoked-latency-avg	The average time taken by the on-partitions-revoked rebalance listener callback	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
partitions-revoked-latency-max	The max time taken by the on-partitions-revoked rebalance listener callback	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
partitions-assigned-latency-avg	The average time taken by the on-partitions-assigned rebalance listener callback	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
partitions-assigned-latency-max	The max time taken by the on-partitions-assigned rebalance listener callback	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
partitions-lost-latency-avg	The average time taken by the on-partitions-lost rebalance listener callback	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
partitions-lost-latency-max	The max time taken by the on-partitions-lost rebalance listener callback	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)

	Attribute name	Description
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=”{client-id}”
	bytes-consumed-rate	The average number of bytes consumed per second
	bytes-consumed-total	The total number of bytes consumed
	fetch-latency-avg	The average time taken for a fetch request.
	fetch-latency-max	The max time taken for any fetch request.
	fetch-rate	The number of fetch requests per second.
	fetch-size-avg	The average number of bytes fetched per request
	fetch-size-max	The maximum number of bytes fetched per request
	fetch-throttle-time-avg	The average throttle time in ms
	fetch-throttle-time-max	The maximum throttle time in ms
	fetch-total	The total number of fetch requests.
	records-consumed-rate	The average number of records consumed per second
	records-consumed-total	The total number of records consumed
	records-lag-max	The maximum lag in terms of number of records for any partition in this window
	records-lead-min	The minimum lead in terms of number of records for any partition in this window
	records-per-request-avg	The average number of records in each request
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=”{client-id}”,topic=”{topic}”
	Attribute name	Description
	bytes-consumed-rate	The average number of bytes consumed per second for a topic
	bytes-consumed-total	The total number of bytes consumed for a topic
	fetch-size-avg	The average number of bytes fetched per request for a topic
	fetch-size-max	The maximum number of bytes fetched per request for a topic
	records-consumed-rate	The average number of records consumed per second for a topic
	records-consumed-total	The total number of records consumed for a topic
	records-per-request-avg	The average number of records in each request for a topic
kafka.consumer:type=consumer-fetch-manager-metrics,partition=”{partition}”,topic=”{topic}”,client-id=”{client-id}”
	Attribute name	Description
	preferred-read-replica	The current read replica for the partition, or -1 if reading from leader
	records-lag	The latest lag of the partition
	records-lag-avg	The average lag of the partition
	records-lag-max	The max lag of the partition
	records-lead	The latest lead of the partition
	records-lead-avg	The average lead of the partition
	records-lead-min	The min lead of the partition

A Connect worker process contains all the producer and consumer metrics as well as metrics specific to Connect. The worker process itself has a number of metrics, while each connector and task have additional metrics. [2021-09-09 00:26:06,127] INFO Metrics scheduler closed (org.apache.kafka.common.metrics.Metrics:659) [2021-09-09 00:26:06,131] INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics:669)

	Attribute name	Description
kafka.connect:type=connect-worker-metrics
	connector-count	The number of connectors run in this worker.
	connector-startup-attempts-total	The total number of connector startups that this worker has attempted.
	connector-startup-failure-percentage	The average percentage of this worker’s connectors starts that failed.
	connector-startup-failure-total	The total number of connector starts that failed.
	connector-startup-success-percentage	The average percentage of this worker’s connectors starts that succeeded.
	connector-startup-success-total	The total number of connector starts that succeeded.
	task-count	The number of tasks run in this worker.
	task-startup-attempts-total	The total number of task startups that this worker has attempted.
	task-startup-failure-percentage	The average percentage of this worker’s tasks starts that failed.
	task-startup-failure-total	The total number of task starts that failed.
	task-startup-success-percentage	The average percentage of this worker’s tasks starts that succeeded.
	task-startup-success-total	The total number of task starts that succeeded.
kafka.connect:type=connect-worker-metrics,connector=”{connector}”
	Attribute name	Description
	connector-destroyed-task-count	The number of destroyed tasks of the connector on the worker.
	connector-failed-task-count	The number of failed tasks of the connector on the worker.
	connector-paused-task-count	The number of paused tasks of the connector on the worker.
	connector-restarting-task-count	The number of restarting tasks of the connector on the worker.
	connector-running-task-count	The number of running tasks of the connector on the worker.
	connector-total-task-count	The number of tasks of the connector on the worker.
	connector-unassigned-task-count	The number of unassigned tasks of the connector on the worker.
kafka.connect:type=connect-worker-rebalance-metrics
	Attribute name	Description
	completed-rebalances-total	The total number of rebalances completed by this worker.
	connect-protocol	The Connect protocol used by this cluster
	epoch	The epoch or generation number of this worker.
	leader-name	The name of the group leader.
	rebalance-avg-time-ms	The average time in milliseconds spent by this worker to rebalance.
	rebalance-max-time-ms	The maximum time in milliseconds spent by this worker to rebalance.
	rebalancing	Whether this worker is currently rebalancing.
	time-since-last-rebalance-ms	The time in milliseconds since this worker completed the most recent rebalance.
kafka.connect:type=connector-metrics,connector=”{connector}”
	Attribute name	Description
	connector-class	The name of the connector class.
	connector-type	The type of the connector. One of ‘source’ or ‘sink’.
	connector-version	The version of the connector class, as reported by the connector.
	status	The status of the connector. One of ‘unassigned’, ‘running’, ‘paused’, ‘failed’, or ‘destroyed’.
kafka.connect:type=connector-task-metrics,connector=”{connector}”,task=”{task}”
	Attribute name	Description
	batch-size-avg	The average size of the batches processed by the connector.
	batch-size-max	The maximum size of the batches processed by the connector.
	offset-commit-avg-time-ms	The average time in milliseconds taken by this task to commit offsets.
	offset-commit-failure-percentage	The average percentage of this task’s offset commit attempts that failed.
	offset-commit-max-time-ms	The maximum time in milliseconds taken by this task to commit offsets.
	offset-commit-success-percentage	The average percentage of this task’s offset commit attempts that succeeded.
	pause-ratio	The fraction of time this task has spent in the pause state.
	running-ratio	The fraction of time this task has spent in the running state.
	status	The status of the connector task. One of ‘unassigned’, ‘running’, ‘paused’, ‘failed’, or ‘destroyed’.
kafka.connect:type=sink-task-metrics,connector=”{connector}”,task=”{task}”
	Attribute name	Description
	offset-commit-completion-rate	The average per-second number of offset commit completions that were completed successfully.
	offset-commit-completion-total	The total number of offset commit completions that were completed successfully.
	offset-commit-seq-no	The current sequence number for offset commits.
	offset-commit-skip-rate	The average per-second number of offset commit completions that were received too late and skipped/ignored.
	offset-commit-skip-total	The total number of offset commit completions that were received too late and skipped/ignored.
	partition-count	The number of topic partitions assigned to this task belonging to the named sink connector in this worker.
	put-batch-avg-time-ms	The average time taken by this task to put a batch of sinks records.
	put-batch-max-time-ms	The maximum time taken by this task to put a batch of sinks records.
	sink-record-active-count	The number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
	sink-record-active-count-avg	The average number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
	sink-record-active-count-max	The maximum number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
	sink-record-lag-max	The maximum lag in terms of number of records that the sink task is behind the consumer’s position for any topic partitions.
	sink-record-read-rate	The average per-second number of records read from Kafka for this task belonging to the named sink connector in this worker. This is before transformations are applied.
	sink-record-read-total	The total number of records read from Kafka by this task belonging to the named sink connector in this worker, since the task was last restarted.
	sink-record-send-rate	The average per-second number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.
	sink-record-send-total	The total number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker, since the task was last restarted.
kafka.connect:type=source-task-metrics,connector=”{connector}”,task=”{task}”
	Attribute name	Description
	poll-batch-avg-time-ms	The average time in milliseconds taken by this task to poll for a batch of source records.
	poll-batch-max-time-ms	The maximum time in milliseconds taken by this task to poll for a batch of source records.
	source-record-active-count	The number of records that have been produced by this task but not yet completely written to Kafka.
	source-record-active-count-avg	The average number of records that have been produced by this task but not yet completely written to Kafka.
	source-record-active-count-max	The maximum number of records that have been produced by this task but not yet completely written to Kafka.
	source-record-poll-rate	The average per-second number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.
	source-record-poll-total	The total number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.
	source-record-write-rate	The average per-second number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.
	source-record-write-total	The number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker, since the task was last restarted.
kafka.connect:type=task-error-metrics,connector=”{connector}”,task=”{task}”
	Attribute name	Description
	deadletterqueue-produce-failures	The number of failed writes to the dead letter queue.
	deadletterqueue-produce-requests	The number of attempted writes to the dead letter queue.
	last-error-timestamp	The epoch timestamp when this task last encountered an error.
	total-errors-logged	The number of errors that were logged.
	total-record-errors	The number of record processing errors in this task.
	total-record-failures	The number of record processing failures in this task.
	total-records-skipped	The number of records skipped due to errors.
	total-retries	The number of operations retried.

A Kafka Streams instance contains all the producer and consumer metrics as well as additional metrics specific to Streams. By default Kafka Streams has metrics with three recording levels: info, debug, and trace.

Note that the metrics have a 4-layer hierarchy. At the top level there are client-level metrics for each started Kafka Streams client. Each client has stream threads, with their own metrics. Each stream thread has tasks, with their own metrics. Each task has a number of processor nodes, with their own metrics. Each task also has a number of state stores and record caches, all with their own metrics.

Use the following configuration option to specify which metrics you want collected:

All of the following metrics have a recording level of info:

Metric/Attribute name	Description	Mbean name
version	The version of the Kafka Streams client.	kafka.streams:type=stream-metrics,client-id=([-.\w]+)
commit-id	The version control commit ID of the Kafka Streams client.	kafka.streams:type=stream-metrics,client-id=([-.\w]+)
application-id	The application ID of the Kafka Streams client.	kafka.streams:type=stream-metrics,client-id=([-.\w]+)
topology-description	The description of the topology executed in the Kafka Streams client.	kafka.streams:type=stream-metrics,client-id=([-.\w]+)
state	The state of the Kafka Streams client.	kafka.streams:type=stream-metrics,client-id=([-.\w]+)
failed-stream-threads	The number of failed stream threads since the start of the Kafka Streams client.	kafka.streams:type=stream-metrics,client-id=([-.\w]+)

All of the following metrics have a recording level of info:

All of the following metrics have a recording level of debug, except for the dropped-records-* and active-process-ratio metrics which have a recording level of :

Metric/Attribute name	Description	Mbean name
process-latency-avg	The average execution time in ns, for processing.	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
process-latency-max	The maximum execution time in ns, for processing.	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
process-rate	The average number of processed records per second across all source processor nodes of this task.	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
process-total	The total number of processed records across all source processor nodes of this task.	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
commit-latency-avg	The average execution time in ns, for committing.	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
commit-latency-max	The maximum execution time in ns, for committing.	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
commit-rate	The average number of commit calls per second.	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
commit-total	The total number of commit calls.	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
record-lateness-avg	The average observed lateness of records (stream time - record timestamp).	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
record-lateness-max	The max observed lateness of records (stream time - record timestamp).	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
enforced-processing-rate	The average number of enforced processings per second.	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
enforced-processing-total	The total number enforced processings.	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
dropped-records-rate	The average number of records dropped within this task.	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
dropped-records-total	The total number of records dropped within this task.	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
active-process-ratio	The fraction of time the stream thread spent on processing this task among all assigned active tasks.	kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)

Metric/Attribute name	Description	Mbean name
process-rate	The average number of records processed by a source processor node per second.	kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
process-total	The total number of records processed by a source processor node per second.	kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
suppression-emit-rate	The rate at which records that have been emitted downstream from suppression operation nodes.	kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
suppression-emit-total	The total number of records that have been emitted downstream from suppression operation nodes.	kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
record-e2e-latency-avg	The average end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.	kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
record-e2e-latency-max	The maximum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.	kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
record-e2e-latency-min	The minimum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.	kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)

All of the following metrics have a recording level of debug, except for the record-e2e-latency-* metrics which have a recording level trace>. Note that the store-scope value is specified in StoreSupplier#metricsScope() for user’s customized state stores; for built-in state stores, currently we have:

in-memory-state
in-memory-lru-state
in-memory-window-state
in-memory-suppression (for suppression buffers)
(for RocksDB backed key-value store)
rocksdb-window-state (for RocksDB backed window store)
rocksdb-session-state (for RocksDB backed session store)

Metrics suppression-buffer-size-avg, suppression-buffer-size-max, suppression-buffer-count-avg, and suppression-buffer-count-max are only available for suppression buffers. All other metrics are not available for suppression buffers.

Metric/Attribute name	Description	Mbean name
put-latency-avg	The average put execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
put-latency-max	The maximum put execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
put-if-absent-latency-avg	The average put-if-absent execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
put-if-absent-latency-max	The maximum put-if-absent execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
get-latency-avg	The average get execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
get-latency-max	The maximum get execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
delete-latency-avg	The average delete execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
delete-latency-max	The maximum delete execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
put-all-latency-avg	The average put-all execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
put-all-latency-max	The maximum put-all execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
all-latency-avg	The average all operation execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
all-latency-max	The maximum all operation execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
range-latency-avg	The average range execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
range-latency-max	The maximum range execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
flush-latency-avg	The average flush execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
flush-latency-max	The maximum flush execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
restore-latency-avg	The average restore execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
restore-latency-max	The maximum restore execution time in ns.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
put-rate	The average put rate for this store.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
put-if-absent-rate	The average put-if-absent rate for this store.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
get-rate	The average get rate for this store.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
delete-rate	The average delete rate for this store.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
put-all-rate	The average put-all rate for this store.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
all-rate	The average all operation rate for this store.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
range-rate	The average range rate for this store.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
flush-rate	The average flush rate for this store.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
restore-rate	The average restore rate for this store.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
suppression-buffer-size-avg	The average total size, in bytes, of the buffered data over the sampling window.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),in-memory-suppression-id=([-.\w]+)
suppression-buffer-size-max	The maximum total size, in bytes, of the buffered data over the sampling window.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),in-memory-suppression-id=([-.\w]+)
suppression-buffer-count-avg	The average number of records buffered over the sampling window.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),in-memory-suppression-id=([-.\w]+)
suppression-buffer-count-max	The maximum number of records buffered over the sampling window.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),in-memory-suppression-id=([-.\w]+)
record-e2e-latency-avg	The average end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
record-e2e-latency-max	The maximum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
record-e2e-latency-min	The minimum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)

RocksDB metrics are grouped into statistics-based metrics and properties-based metrics. The former are recorded from statistics that a RocksDB state store collects whereas the latter are recorded from properties that RocksDB exposes. Statistics collected by RocksDB provide cumulative measurements over time, e.g. bytes written to the state store. Properties exposed by RocksDB provide current measurements, e.g., the amount of memory currently used. Note that the store-scope for built-in RocksDB state stores are currently the following:

rocksdb-state (for RocksDB backed key-value store)
rocksdb-session-state (for RocksDB backed session store)

RocksDB Statistics-based Metrics: All of the following statistics-based metrics have a recording level of debug because collecting statistics in RocksDB may have an impact on performance. Statistics-based metrics are collected every minute from the RocksDB state stores. If a state store consists of multiple RocksDB instances, as is the case for WindowStores and SessionStores, each metric reports an aggregation over the RocksDB instances of the state store.

Metric/Attribute name	Description	Mbean name
bytes-written-rate	The average number of bytes written per second to the RocksDB state store.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
bytes-written-total	The total number of bytes written to the RocksDB state store.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
bytes-read-rate	The average number of bytes read per second from the RocksDB state store.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
bytes-read-total	The total number of bytes read from the RocksDB state store.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
memtable-bytes-flushed-rate	The average number of bytes flushed per second from the memtable to disk.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
memtable-bytes-flushed-total	The total number of bytes flushed from the memtable to disk.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
memtable-hit-ratio	The ratio of memtable hits relative to all lookups to the memtable.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
block-cache-data-hit-ratio	The ratio of block cache hits for data blocks relative to all lookups for data blocks to the block cache.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
block-cache-index-hit-ratio	The ratio of block cache hits for index blocks relative to all lookups for index blocks to the block cache.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
block-cache-filter-hit-ratio	The ratio of block cache hits for filter blocks relative to all lookups for filter blocks to the block cache.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
write-stall-duration-avg	The average duration of write stalls in ms.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
write-stall-duration-total	The total duration of write stalls in ms.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
bytes-read-compaction-rate	The average number of bytes read per second during compaction.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
bytes-written-compaction-rate	The average number of bytes written per second during compaction.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
number-open-files	The number of current open files.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
number-file-errors-total	The total number of file errors occurred.	kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)

RocksDB Properties-based Metrics: All of the following properties-based metrics have a recording level of info and are recorded when the metrics are accessed. If a state store consists of multiple RocksDB instances, as is the case for WindowStores and SessionStores, each metric reports the sum over all the RocksDB instances of the state store, except for the block cache metrics block-cache-*. The block cache metrics report the sum over all RocksDB instances if each instance uses its own block cache, and they report the recorded value from only one instance if a single block cache is shared among all instances.

Record Cache Metrics

All of the following metrics have a recording level of debug:

Metric/Attribute name	Description	Mbean name
hit-ratio-avg	The average cache hit ratio defined as the ratio of cache read hits over the total cache read requests.	kafka.streams:type=stream-record-cache-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)
hit-ratio-min	The mininum cache hit ratio.	kafka.streams:type=stream-record-cache-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)
hit-ratio-max	The maximum cache hit ratio.	kafka.streams:type=stream-record-cache-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)

Others

We recommend monitoring GC time and other stats and various server stats such as CPU utilization, I/O service time, etc. On the client side, we recommend monitoring the message/byte rate (global and per topic), request rate/size/time, and on the consumer side, max lag in messages among all partitions and min fetch request rate. For a consumer to keep up, max lag needs to be less than a threshold and min fetch rate needs to be larger than 0.