Monitoring

    Fluent Bit comes with a built-in HTTP Server that can be used to query internal information and monitor metrics of each running plugin.

    The monitoring interface can be easily integrated with Prometheus since we support it native format.

    NOTE: The Windows version does not support the HTTP monitoring feature yet as of v1.7.0

    To get started, the first step is to enable the HTTP Server from the configuration file:

    the above configuration snippet will instruct Fluent Bit to start it HTTP Server on TCP Port 2020 and listening on all network interfaces:

    1. Fluent Bit v1.4.0
    2. * Copyright (C) 2019-2020 The Fluent Bit Authors
    3. * Copyright (C) 2015-2018 Treasure Data
    4. * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
    5. * https://fluentbit.io
    6. [2020/03/10 19:08:24] [ info] [engine] started
    7. [2020/03/10 19:08:24] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020

    now with a simple curl command is enough to gather some information:

    1. $ curl -s http://127.0.0.1:2020 | jq
    2. {
    3. "fluent-bit": {
    4. "version": "0.13.0",
    5. "edition": "Community",
    6. "flags": [
    7. "FLB_HAVE_TLS",
    8. "FLB_HAVE_METRICS",
    9. "FLB_HAVE_SQLDB",
    10. "FLB_HAVE_TRACE",
    11. "FLB_HAVE_HTTP_SERVER",
    12. "FLB_HAVE_FLUSH_LIBCO",
    13. "FLB_HAVE_SYSTEMD",
    14. "FLB_HAVE_VALGRIND",
    15. "FLB_HAVE_FORK",
    16. "FLB_HAVE_PROXY_GO",
    17. "FLB_HAVE_REGEX",
    18. "FLB_HAVE_SETJMP",
    19. "FLB_HAVE_ACCEPT4",
    20. "FLB_HAVE_INOTIFY"
    21. ]
    22. }
    23. }

    Note that we are sending the curl command output to the jq program which helps to make the JSON data easy to read from the terminal. Fluent Bit don’t aim to do JSON pretty-printing.

    REST API Interface

    Fluent Bit aims to expose useful interfaces for monitoring, as of Fluent Bit v0.14 the following end points are available:

    it should print a similar output like this:

    Metrics Examples

    Query internal metrics in JSON format with the following command:

    1. $ curl -s http://127.0.0.1:2020/api/v1/metrics | jq

    it should print a similar output like this:

    1. {
    2. "input": {
    3. "cpu.0": {
    4. "records": 8,
    5. "bytes": 2536
    6. }
    7. },
    8. "output": {
    9. "stdout.0": {
    10. "proc_records": 5,
    11. "proc_bytes": 1585,
    12. "errors": 0,
    13. "retries": 0,
    14. "retries_failed": 0
    15. }
    16. }
    17. }

    Query internal metrics in Prometheus Text 0.0.4 format:

    1. $ curl -s http://127.0.0.1:2020/api/v1/metrics/prometheus

    this time the same metrics will be in Prometheus format instead of JSON:

    By default configured plugins on runtime get an internal name in the format plugin_name.ID. For monitoring purposes this can be confusing if many plugins of the same type were configured. To make a distinction each configured input or output section can get an alias that will be used as the parent name for the metric.

    The following example set an alias to the INPUT section which is using the input plugin:

    1. [SERVICE]
    2. HTTP_Server On
    3. HTTP_Listen 0.0.0.0
    4. HTTP_PORT 2020
    5. [INPUT]
    6. Name cpu
    7. Alias server1_cpu
    8. [OUTPUT]
    9. Alias raw_output
    10. Match *

    Now when querying the metrics we get the aliases in place instead of the plugin name:

    1. {
    2. "input": {
    3. "records": 8,
    4. "bytes": 2536
    5. }
    6. },
    7. "output": {
    8. "raw_output": {
    9. "proc_records": 5,
    10. "proc_bytes": 1585,
    11. "errors": 0,
    12. "retries": 0,
    13. "retries_failed": 0
    14. }
    15. }
    16. }

    Dashboard and Alerts

    The provided is heavily inspired by Banzai Cloud‘s but with a few key differences such as the use of the instance label (see why here), stacked graphs and a focus on Fluent Bit metrics.

    Sample alerts are available .

    Fluent bit now suppose four new config to setup health check.

    Config Name Description Default Value
    Health_Check enable Health check feature Off
    HC_Errors_Count the error count to meet the unhealthy requirement 5
    HC_Retry_Failure_Count the retry failure count to meet the unhealthy requirement 5
    HC_Period The time period by second to count the error and retry failure data point 60

    So the feature works as: Based on the HC_Period customer setup, if the real error number is over HC_Errors_Count or retry failure is over HC_Retry_Failure_Count, fluent bit will be considered as unhealthy. The health endpoint will return HTTP status 500 and String error. Otherwise it’s healthy, will return HTTP status 200 and string ok

    See the config example:

    1. [SERVICE]
    2. HTTP_Server On
    3. HTTP_Listen 0.0.0.0
    4. HTTP_PORT 2020
    5. Health_Check On
    6. HC_Errors_Count 5
    7. HC_Retry_Failure_Count 5
    8. HC_Period 5
    9. [INPUT]
    10. Name cpu
    11. [OUTPUT]
    12. Name stdout
    13. Match *

    The command to call health endpoint

    Based on the fluent bit status, the result will be:

    • HTTP status 200 and “ok” in response for healthy status
    • HTTP status 500 and “error” in response for unhealthy status