All Kafka rate metrics have a corresponding cumulative count metric with suffix . For example, records-consumed-rate has a corresponding metric named records-consumed-total.

    The easiest way to see the available metrics is to fire up jconsole and point it at a running kafka client or server; this will allow browsing all metrics with JMX.

    Security Considerations for Remote Monitoring using JMX

    Apache Kafka disables remote JMX by default. You can enable remote monitoring using JMX by setting the environment variable JMX_PORT for processes started using the CLI or standard Java system properties to enable remote JMX programmatically. You must enable security when enabling remote JMX in production scenarios to ensure that unauthorized users cannot monitor or control your broker or application as well as the platform on which these are running. Note that authentication is disabled for JMX by default in Kafka and security configs must be overridden for production deployments by setting the environment variable KAFKA_JMX_OPTS for processes started using the CLI or by setting appropriate Java system properties. See for details on securing JMX.

    We do graphing and alerting on the following metrics:

    The following metrics are available on producer/consumer/connector/streams instances. For specific metrics, please see following sections.

    Metric/Attribute nameDescriptionMbean name
    connection-close-rateConnections closed per second in the window.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    connection-close-totalTotal connections closed in the window.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    connection-creation-rateNew connections established per second in the window.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    connection-creation-totalTotal new connections established in the window.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    network-io-rateThe average number of network operations (reads or writes) on all connections per second.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    network-io-totalThe total number of network operations (reads or writes) on all connections.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    outgoing-byte-rateThe average number of outgoing bytes sent per second to all servers.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    outgoing-byte-totalThe total number of outgoing bytes sent to all servers.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    request-rateThe average number of requests sent per second.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    request-totalThe total number of requests sent.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    request-size-avgThe average size of all requests in the window.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    request-size-maxThe maximum size of any request sent in the window.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    incoming-byte-rateBytes/second read off all sockets.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    incoming-byte-totalTotal bytes read off all sockets.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    response-rateResponses received per second.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    response-totalTotal responses received.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    select-rateNumber of times the I/O layer checked for new I/O to perform per second.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    select-totalTotal number of times the I/O layer checked for new I/O to perform.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    io-wait-time-ns-avgThe average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    io-wait-ratioThe fraction of time the I/O thread spent waiting.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    io-time-ns-avgThe average length of time for I/O per select call in nanoseconds.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    io-ratioThe fraction of time the I/O thread spent doing I/O.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    connection-countThe current number of active connections.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    successful-authentication-rateConnections per second that were successfully authenticated using SASL or SSL.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    successful-authentication-totalTotal connections that were successfully authenticated using SASL or SSL.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    failed-authentication-rateConnections per second that failed authentication.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    failed-authentication-totalTotal connections that failed authentication.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    successful-reauthentication-rateConnections per second that were successfully re-authenticated using SASL.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    successful-reauthentication-totalTotal connections that were successfully re-authenticated using SASL.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    reauthentication-latency-maxThe maximum latency in ms observed due to re-authentication.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    reauthentication-latency-avgThe average latency in ms observed due to re-authentication.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    failed-reauthentication-rateConnections per second that failed re-authentication.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    failed-reauthentication-totalTotal connections that failed re-authentication.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
    successful-authentication-no-reauth-totalTotal connections that were successfully authenticated by older, pre-2.2.0 SASL clients that do not support re-authentication. May only be non-zero.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)

    The following metrics are available on producer/consumer/connector/streams instances. For specific metrics, please see following sections.

    Metric/Attribute nameDescriptionMbean name
    outgoing-byte-rateThe average number of outgoing bytes sent per second for a node.kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
    outgoing-byte-totalThe total number of outgoing bytes sent for a node.kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
    request-rateThe average number of requests sent per second for a node.kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
    request-totalThe total number of requests sent for a node.kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
    request-size-avgThe average size of all requests in the window for a node.kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
    request-size-maxThe maximum size of any request sent in the window for a node.kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
    incoming-byte-rateThe average number of bytes received per second for a node.kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
    incoming-byte-totalThe total number of bytes received for a node.kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
    request-latency-avgThe average request latency in ms for a node.kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
    request-latency-maxThe maximum request latency in ms for a node.kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
    response-rateResponses received per second for a node.kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
    response-totalTotal responses received for a node.kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)

    The following metrics are available on producer instances.

    Metric/Attribute nameDescriptionMbean name
    waiting-threadsThe number of user threads blocked waiting for buffer memory to enqueue their records.kafka.producer:type=producer-metrics,client-id=([-.\w]+)
    buffer-total-bytesThe maximum amount of buffer memory the client can use (whether or not it is currently used).kafka.producer:type=producer-metrics,client-id=([-.\w]+)
    buffer-available-bytesThe total amount of buffer memory that is not being used (either unallocated or in the free list).kafka.producer:type=producer-metrics,client-id=([-.\w]+)
    bufferpool-wait-timeThe fraction of time an appender waits for space allocation.kafka.producer:type=producer-metrics,client-id=([-.\w]+)
    kafka.producer:type=producer-metrics,client-id=”{client-id}”
    Attribute nameDescription
    batch-size-avgThe average number of bytes sent per partition per-request.
    batch-size-maxThe max number of bytes sent per partition per-request.
    batch-split-rateThe average number of batch splits per second
    batch-split-totalThe total number of batch splits
    compression-rate-avgThe average compression rate of record batches, defined as the average ratio of the compressed batch size over the uncompressed size.
    metadata-ageThe age in seconds of the current producer metadata being used.
    produce-throttle-time-avgThe average time in ms a request was throttled by a broker
    produce-throttle-time-maxThe maximum time in ms a request was throttled by a broker
    record-error-rateThe average per-second number of record sends that resulted in errors
    record-error-totalThe total number of record sends that resulted in errors
    record-queue-time-avgThe average time in ms record batches spent in the send buffer.
    record-queue-time-maxThe maximum time in ms record batches spent in the send buffer.
    record-retry-rateThe average per-second number of retried record sends
    record-retry-totalThe total number of retried record sends
    record-send-rateThe average number of records sent per second.
    record-send-totalThe total number of records sent.
    record-size-avgThe average record size
    record-size-maxThe maximum record size
    records-per-request-avgThe average number of records per request.
    request-latency-avgThe average request latency in ms
    request-latency-maxThe maximum request latency in ms
    requests-in-flightThe current number of in-flight requests awaiting a response.
    kafka.producer:type=producer-topic-metrics,client-id=”{client-id}”,topic=”{topic}”
    Attribute nameDescription
    byte-rateThe average number of bytes sent per second for a topic.
    byte-totalThe total number of bytes sent for a topic.
    compression-rateThe average compression rate of record batches for a topic, defined as the average ratio of the compressed batch size over the uncompressed size.
    record-error-rateThe average per-second number of record sends that resulted in errors for a topic
    record-error-totalThe total number of record sends that resulted in errors for a topic
    record-retry-rateThe average per-second number of retried record sends for a topic
    record-retry-totalThe total number of retried record sends for a topic
    record-send-rateThe average number of records sent per second for a topic.
    record-send-totalThe total number of records sent for a topic.

    Metric/Attribute nameDescriptionMbean name
    commit-latency-avgThe average time taken for a commit requestkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    commit-latency-maxThe max time taken for a commit requestkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    commit-rateThe number of commit calls per secondkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    commit-totalThe total number of commit callskafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    assigned-partitionsThe number of partitions currently assigned to this consumerkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    heartbeat-response-time-maxThe max time taken to receive a response to a heartbeat requestkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    heartbeat-rateThe average number of heartbeats per secondkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    heartbeat-totalThe total number of heartbeatskafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    join-time-avgThe average time taken for a group rejoinkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    join-time-maxThe max time taken for a group rejoinkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    join-rateThe number of group joins per secondkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    join-totalThe total number of group joinskafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    sync-time-avgThe average time taken for a group synckafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    sync-time-maxThe max time taken for a group synckafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    sync-rateThe number of group syncs per secondkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    sync-totalThe total number of group syncskafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    rebalance-latency-avgThe average time taken for a group rebalancekafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    rebalance-latency-maxThe max time taken for a group rebalancekafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    rebalance-latency-totalThe total time taken for group rebalances so farkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    rebalance-totalThe total number of group rebalances participatedkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    rebalance-rate-per-hourThe number of group rebalance participated per hourkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    failed-rebalance-totalThe total number of failed group rebalanceskafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    failed-rebalance-rate-per-hourThe number of failed group rebalance event per hourkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    last-rebalance-seconds-agoThe number of seconds since the last rebalance eventkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    last-heartbeat-seconds-agoThe number of seconds since the last controller heartbeatkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    partitions-revoked-latency-avgThe average time taken by the on-partitions-revoked rebalance listener callbackkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    partitions-revoked-latency-maxThe max time taken by the on-partitions-revoked rebalance listener callbackkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    partitions-assigned-latency-avgThe average time taken by the on-partitions-assigned rebalance listener callbackkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    partitions-assigned-latency-maxThe max time taken by the on-partitions-assigned rebalance listener callbackkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    partitions-lost-latency-avgThe average time taken by the on-partitions-lost rebalance listener callbackkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    partitions-lost-latency-maxThe max time taken by the on-partitions-lost rebalance listener callbackkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
    kafka.consumer:type=consumer-fetch-manager-metrics,client-id=”{client-id}”
    Attribute nameDescription
    bytes-consumed-rateThe average number of bytes consumed per second
    bytes-consumed-totalThe total number of bytes consumed
    fetch-latency-avgThe average time taken for a fetch request.
    fetch-latency-maxThe max time taken for any fetch request.
    fetch-rateThe number of fetch requests per second.
    fetch-size-avgThe average number of bytes fetched per request
    fetch-size-maxThe maximum number of bytes fetched per request
    fetch-throttle-time-avgThe average throttle time in ms
    fetch-throttle-time-maxThe maximum throttle time in ms
    fetch-totalThe total number of fetch requests.
    records-consumed-rateThe average number of records consumed per second
    records-consumed-totalThe total number of records consumed
    records-lag-maxThe maximum lag in terms of number of records for any partition in this window
    records-lead-minThe minimum lead in terms of number of records for any partition in this window
    records-per-request-avgThe average number of records in each request
    kafka.consumer:type=consumer-fetch-manager-metrics,client-id=”{client-id}”,topic=”{topic}”
    Attribute nameDescription
    bytes-consumed-rateThe average number of bytes consumed per second for a topic
    bytes-consumed-totalThe total number of bytes consumed for a topic
    fetch-size-avgThe average number of bytes fetched per request for a topic
    fetch-size-maxThe maximum number of bytes fetched per request for a topic
    records-consumed-rateThe average number of records consumed per second for a topic
    records-consumed-totalThe total number of records consumed for a topic
    records-per-request-avgThe average number of records in each request for a topic
    kafka.consumer:type=consumer-fetch-manager-metrics,partition=”{partition}”,topic=”{topic}”,client-id=”{client-id}”
    Attribute nameDescription
    preferred-read-replicaThe current read replica for the partition, or -1 if reading from leader
    records-lagThe latest lag of the partition
    records-lag-avgThe average lag of the partition
    records-lag-maxThe max lag of the partition
    records-leadThe latest lead of the partition
    records-lead-avgThe average lead of the partition
    records-lead-minThe min lead of the partition

    A Connect worker process contains all the producer and consumer metrics as well as metrics specific to Connect. The worker process itself has a number of metrics, while each connector and task have additional metrics. [2021-09-09 00:26:06,127] INFO Metrics scheduler closed (org.apache.kafka.common.metrics.Metrics:659) [2021-09-09 00:26:06,131] INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics:669)

    kafka.connect:type=connect-worker-metrics
    Attribute nameDescription
    connector-countThe number of connectors run in this worker.
    connector-startup-attempts-totalThe total number of connector startups that this worker has attempted.
    connector-startup-failure-percentageThe average percentage of this worker’s connectors starts that failed.
    connector-startup-failure-totalThe total number of connector starts that failed.
    connector-startup-success-percentageThe average percentage of this worker’s connectors starts that succeeded.
    connector-startup-success-totalThe total number of connector starts that succeeded.
    task-countThe number of tasks run in this worker.
    task-startup-attempts-totalThe total number of task startups that this worker has attempted.
    task-startup-failure-percentageThe average percentage of this worker’s tasks starts that failed.
    task-startup-failure-totalThe total number of task starts that failed.
    task-startup-success-percentageThe average percentage of this worker’s tasks starts that succeeded.
    task-startup-success-totalThe total number of task starts that succeeded.
    kafka.connect:type=connect-worker-metrics,connector=”{connector}”
    Attribute nameDescription
    connector-destroyed-task-countThe number of destroyed tasks of the connector on the worker.
    connector-failed-task-countThe number of failed tasks of the connector on the worker.
    connector-paused-task-countThe number of paused tasks of the connector on the worker.
    connector-restarting-task-countThe number of restarting tasks of the connector on the worker.
    connector-running-task-countThe number of running tasks of the connector on the worker.
    connector-total-task-countThe number of tasks of the connector on the worker.
    connector-unassigned-task-countThe number of unassigned tasks of the connector on the worker.
    kafka.connect:type=connect-worker-rebalance-metrics
    Attribute nameDescription
    completed-rebalances-totalThe total number of rebalances completed by this worker.
    connect-protocolThe Connect protocol used by this cluster
    epochThe epoch or generation number of this worker.
    leader-nameThe name of the group leader.
    rebalance-avg-time-msThe average time in milliseconds spent by this worker to rebalance.
    rebalance-max-time-msThe maximum time in milliseconds spent by this worker to rebalance.
    rebalancingWhether this worker is currently rebalancing.
    time-since-last-rebalance-msThe time in milliseconds since this worker completed the most recent rebalance.
    kafka.connect:type=connector-metrics,connector=”{connector}”
    Attribute nameDescription
    connector-classThe name of the connector class.
    connector-typeThe type of the connector. One of ‘source’ or ‘sink’.
    connector-versionThe version of the connector class, as reported by the connector.
    statusThe status of the connector. One of ‘unassigned’, ‘running’, ‘paused’, ‘failed’, or ‘destroyed’.
    kafka.connect:type=connector-task-metrics,connector=”{connector}”,task=”{task}”
    Attribute nameDescription
    batch-size-avgThe average size of the batches processed by the connector.
    batch-size-maxThe maximum size of the batches processed by the connector.
    offset-commit-avg-time-msThe average time in milliseconds taken by this task to commit offsets.
    offset-commit-failure-percentageThe average percentage of this task’s offset commit attempts that failed.
    offset-commit-max-time-msThe maximum time in milliseconds taken by this task to commit offsets.
    offset-commit-success-percentageThe average percentage of this task’s offset commit attempts that succeeded.
    pause-ratioThe fraction of time this task has spent in the pause state.
    running-ratioThe fraction of time this task has spent in the running state.
    statusThe status of the connector task. One of ‘unassigned’, ‘running’, ‘paused’, ‘failed’, or ‘destroyed’.
    kafka.connect:type=sink-task-metrics,connector=”{connector}”,task=”{task}”
    Attribute nameDescription
    offset-commit-completion-rateThe average per-second number of offset commit completions that were completed successfully.
    offset-commit-completion-totalThe total number of offset commit completions that were completed successfully.
    offset-commit-seq-noThe current sequence number for offset commits.
    offset-commit-skip-rateThe average per-second number of offset commit completions that were received too late and skipped/ignored.
    offset-commit-skip-totalThe total number of offset commit completions that were received too late and skipped/ignored.
    partition-countThe number of topic partitions assigned to this task belonging to the named sink connector in this worker.
    put-batch-avg-time-msThe average time taken by this task to put a batch of sinks records.
    put-batch-max-time-msThe maximum time taken by this task to put a batch of sinks records.
    sink-record-active-countThe number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
    sink-record-active-count-avgThe average number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
    sink-record-active-count-maxThe maximum number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
    sink-record-lag-maxThe maximum lag in terms of number of records that the sink task is behind the consumer’s position for any topic partitions.
    sink-record-read-rateThe average per-second number of records read from Kafka for this task belonging to the named sink connector in this worker. This is before transformations are applied.
    sink-record-read-totalThe total number of records read from Kafka by this task belonging to the named sink connector in this worker, since the task was last restarted.
    sink-record-send-rateThe average per-second number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.
    sink-record-send-totalThe total number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker, since the task was last restarted.
    kafka.connect:type=source-task-metrics,connector=”{connector}”,task=”{task}”
    Attribute nameDescription
    poll-batch-avg-time-msThe average time in milliseconds taken by this task to poll for a batch of source records.
    poll-batch-max-time-msThe maximum time in milliseconds taken by this task to poll for a batch of source records.
    source-record-active-countThe number of records that have been produced by this task but not yet completely written to Kafka.
    source-record-active-count-avgThe average number of records that have been produced by this task but not yet completely written to Kafka.
    source-record-active-count-maxThe maximum number of records that have been produced by this task but not yet completely written to Kafka.
    source-record-poll-rateThe average per-second number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.
    source-record-poll-totalThe total number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.
    source-record-write-rateThe average per-second number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.
    source-record-write-totalThe number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker, since the task was last restarted.
    kafka.connect:type=task-error-metrics,connector=”{connector}”,task=”{task}”
    Attribute nameDescription
    deadletterqueue-produce-failuresThe number of failed writes to the dead letter queue.
    deadletterqueue-produce-requestsThe number of attempted writes to the dead letter queue.
    last-error-timestampThe epoch timestamp when this task last encountered an error.
    total-errors-loggedThe number of errors that were logged.
    total-record-errorsThe number of record processing errors in this task.
    total-record-failuresThe number of record processing failures in this task.
    total-records-skippedThe number of records skipped due to errors.
    total-retriesThe number of operations retried.

    A Kafka Streams instance contains all the producer and consumer metrics as well as additional metrics specific to Streams. By default Kafka Streams has metrics with three recording levels: info, debug, and trace.

    Note that the metrics have a 4-layer hierarchy. At the top level there are client-level metrics for each started Kafka Streams client. Each client has stream threads, with their own metrics. Each stream thread has tasks, with their own metrics. Each task has a number of processor nodes, with their own metrics. Each task also has a number of state stores and record caches, all with their own metrics.

    Use the following configuration option to specify which metrics you want collected:

    All of the following metrics have a recording level of info:

    Metric/Attribute nameDescriptionMbean name
    versionThe version of the Kafka Streams client.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
    commit-idThe version control commit ID of the Kafka Streams client.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
    application-idThe application ID of the Kafka Streams client.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
    topology-descriptionThe description of the topology executed in the Kafka Streams client.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
    stateThe state of the Kafka Streams client.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
    failed-stream-threadsThe number of failed stream threads since the start of the Kafka Streams client.kafka.streams:type=stream-metrics,client-id=([-.\w]+)

    All of the following metrics have a recording level of info:

    All of the following metrics have a recording level of debug, except for the dropped-records-* and active-process-ratio metrics which have a recording level of :

    Metric/Attribute nameDescriptionMbean name
    process-latency-avgThe average execution time in ns, for processing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    process-latency-maxThe maximum execution time in ns, for processing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    process-rateThe average number of processed records per second across all source processor nodes of this task.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    process-totalThe total number of processed records across all source processor nodes of this task.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    commit-latency-avgThe average execution time in ns, for committing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    commit-latency-maxThe maximum execution time in ns, for committing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    commit-rateThe average number of commit calls per second.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    commit-totalThe total number of commit calls.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    record-lateness-avgThe average observed lateness of records (stream time - record timestamp).kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    record-lateness-maxThe max observed lateness of records (stream time - record timestamp).kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    enforced-processing-rateThe average number of enforced processings per second.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    enforced-processing-totalThe total number enforced processings.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    dropped-records-rateThe average number of records dropped within this task.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    dropped-records-totalThe total number of records dropped within this task.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    active-process-ratioThe fraction of time the stream thread spent on processing this task among all assigned active tasks.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
    Metric/Attribute nameDescriptionMbean name
    process-rateThe average number of records processed by a source processor node per second.kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
    process-totalThe total number of records processed by a source processor node per second.kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
    suppression-emit-rateThe rate at which records that have been emitted downstream from suppression operation nodes.kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
    suppression-emit-totalThe total number of records that have been emitted downstream from suppression operation nodes.kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
    record-e2e-latency-avgThe average end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
    record-e2e-latency-maxThe maximum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
    record-e2e-latency-minThe minimum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)

    All of the following metrics have a recording level of debug, except for the record-e2e-latency-* metrics which have a recording level trace>. Note that the store-scope value is specified in StoreSupplier#metricsScope() for user’s customized state stores; for built-in state stores, currently we have:

    • in-memory-state
    • in-memory-lru-state
    • in-memory-window-state
    • in-memory-suppression (for suppression buffers)
    • (for RocksDB backed key-value store)
    • rocksdb-window-state (for RocksDB backed window store)
    • rocksdb-session-state (for RocksDB backed session store)

    Metrics suppression-buffer-size-avg, suppression-buffer-size-max, suppression-buffer-count-avg, and suppression-buffer-count-max are only available for suppression buffers. All other metrics are not available for suppression buffers.

    Metric/Attribute nameDescriptionMbean name
    put-latency-avgThe average put execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    put-latency-maxThe maximum put execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    put-if-absent-latency-avgThe average put-if-absent execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    put-if-absent-latency-maxThe maximum put-if-absent execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    get-latency-avgThe average get execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    get-latency-maxThe maximum get execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    delete-latency-avgThe average delete execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    delete-latency-maxThe maximum delete execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    put-all-latency-avgThe average put-all execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    put-all-latency-maxThe maximum put-all execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    all-latency-avgThe average all operation execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    all-latency-maxThe maximum all operation execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    range-latency-avgThe average range execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    range-latency-maxThe maximum range execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    flush-latency-avgThe average flush execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    flush-latency-maxThe maximum flush execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    restore-latency-avgThe average restore execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    restore-latency-maxThe maximum restore execution time in ns.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    put-rateThe average put rate for this store.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    put-if-absent-rateThe average put-if-absent rate for this store.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    get-rateThe average get rate for this store.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    delete-rateThe average delete rate for this store.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    put-all-rateThe average put-all rate for this store.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    all-rateThe average all operation rate for this store.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    range-rateThe average range rate for this store.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    flush-rateThe average flush rate for this store.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    restore-rateThe average restore rate for this store.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    suppression-buffer-size-avgThe average total size, in bytes, of the buffered data over the sampling window.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),in-memory-suppression-id=([-.\w]+)
    suppression-buffer-size-maxThe maximum total size, in bytes, of the buffered data over the sampling window.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),in-memory-suppression-id=([-.\w]+)
    suppression-buffer-count-avgThe average number of records buffered over the sampling window.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),in-memory-suppression-id=([-.\w]+)
    suppression-buffer-count-maxThe maximum number of records buffered over the sampling window.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),in-memory-suppression-id=([-.\w]+)
    record-e2e-latency-avgThe average end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    record-e2e-latency-maxThe maximum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    record-e2e-latency-minThe minimum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)

    RocksDB metrics are grouped into statistics-based metrics and properties-based metrics. The former are recorded from statistics that a RocksDB state store collects whereas the latter are recorded from properties that RocksDB exposes. Statistics collected by RocksDB provide cumulative measurements over time, e.g. bytes written to the state store. Properties exposed by RocksDB provide current measurements, e.g., the amount of memory currently used. Note that the store-scope for built-in RocksDB state stores are currently the following:

    • rocksdb-state (for RocksDB backed key-value store)
    • rocksdb-session-state (for RocksDB backed session store)

    RocksDB Statistics-based Metrics: All of the following statistics-based metrics have a recording level of debug because collecting statistics in RocksDB may have an impact on performance. Statistics-based metrics are collected every minute from the RocksDB state stores. If a state store consists of multiple RocksDB instances, as is the case for WindowStores and SessionStores, each metric reports an aggregation over the RocksDB instances of the state store.

    Metric/Attribute nameDescriptionMbean name
    bytes-written-rateThe average number of bytes written per second to the RocksDB state store.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    bytes-written-totalThe total number of bytes written to the RocksDB state store.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    bytes-read-rateThe average number of bytes read per second from the RocksDB state store.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    bytes-read-totalThe total number of bytes read from the RocksDB state store.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    memtable-bytes-flushed-rateThe average number of bytes flushed per second from the memtable to disk.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    memtable-bytes-flushed-totalThe total number of bytes flushed from the memtable to disk.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    memtable-hit-ratioThe ratio of memtable hits relative to all lookups to the memtable.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    block-cache-data-hit-ratioThe ratio of block cache hits for data blocks relative to all lookups for data blocks to the block cache.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    block-cache-index-hit-ratioThe ratio of block cache hits for index blocks relative to all lookups for index blocks to the block cache.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    block-cache-filter-hit-ratioThe ratio of block cache hits for filter blocks relative to all lookups for filter blocks to the block cache.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    write-stall-duration-avgThe average duration of write stalls in ms.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    write-stall-duration-totalThe total duration of write stalls in ms.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    bytes-read-compaction-rateThe average number of bytes read per second during compaction.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    bytes-written-compaction-rateThe average number of bytes written per second during compaction.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    number-open-filesThe number of current open files.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)
    number-file-errors-totalThe total number of file errors occurred.kafka.streams:type=stream-state-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),[store-scope]-id=([-.\w]+)

    RocksDB Properties-based Metrics: All of the following properties-based metrics have a recording level of info and are recorded when the metrics are accessed. If a state store consists of multiple RocksDB instances, as is the case for WindowStores and SessionStores, each metric reports the sum over all the RocksDB instances of the state store, except for the block cache metrics block-cache-*. The block cache metrics report the sum over all RocksDB instances if each instance uses its own block cache, and they report the recorded value from only one instance if a single block cache is shared among all instances.

    Record Cache Metrics

    All of the following metrics have a recording level of debug:

    Metric/Attribute nameDescriptionMbean name
    hit-ratio-avgThe average cache hit ratio defined as the ratio of cache read hits over the total cache read requests.kafka.streams:type=stream-record-cache-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)
    hit-ratio-minThe mininum cache hit ratio.kafka.streams:type=stream-record-cache-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)
    hit-ratio-maxThe maximum cache hit ratio.kafka.streams:type=stream-record-cache-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)

    Others

    We recommend monitoring GC time and other stats and various server stats such as CPU utilization, I/O service time, etc. On the client side, we recommend monitoring the message/byte rate (global and per topic), request rate/size/time, and on the consumer side, max lag in messages among all partitions and min fetch request rate. For a consumer to keep up, max lag needs to be less than a threshold and min fetch rate needs to be larger than 0.