Tutorial: How to visualize and aggregate missing time-series data in Grafana

    Introduction

    In this tutorial, you’ll see how to use Grafana (an open-source visualization tool) and TimescaleDB for handling missing time-series data (using the TimescaleDB/PostgreSQL data source natively available in Grafana).

    To complete this tutorial, you will need a cursory knowledge of the Structured Query Language (SQL). The tutorial will walk you through each SQL command, but it will be helpful if you’ve seen SQL before.

    You will also need:

    • Time-series dataset with missing data (Note: in case you don’t have one handy, we include an optional step for creating one below.)

    • A working . Once your installation is complete, we can proceed to ingesting or creating sample data and finishing the tutorial.

    • Grafana dashboard connected to your TimescaleDB instance (setup instructions)

    Step 0 - Load your time-series data into TimescaleDB and simulate missing data (optional)

    (Please skip this step if you already have TimescaleDB loaded with your time-series data.)

    For this tutorial, we are going to load our TimescaleDB instance with simulated IoT sensor data (available in our ).

    To simulate missing data, let’s delete all data our sensors collected between 1 hour and 2 hours ago:

    (For this and the following steps, we’ll use the IoT dataset from Step 0, but the steps are the same if you use your own - real or simulated - dataset).

    To confirm we’re missing data values, let’s create a simple graph that calculates the average temperature readings from sensor_1 over the past 6 hours (using time_bucket).

    There is missing data from 17:05 to 18:10, as we can see by the lack of data points (flat line) during that time period.

    Step 2 - Interpolate (fill in) the missing data

    For interpolating the missing data, we use , combined with LOCF (“Last Observation Carried Forward”). This takes the last reading before the missing data began and plots it (the last recorded value) at regular time intervals until new data is received:

    1. SELECT
    2. time_bucket_gapfill('5 minutes', "time") as time,
    3. FROM sensor_data
    4. $__timeFilter("time") AND
    5. sensor_id = 1
    6. GROUP BY time_bucket_gapfill('5 minutes', "time")
    7. ORDER BY 1

    LOCF is a handy interpolation technique when you have missing data, but no additional context to determine what the missing data values might have been.

    Grafana Screenshot: Interpolating using LOCF

    Now, we return to our original problem: wanting to aggregate data across a large time window with missing data.

    Here we use our interpolated data and compute the average temperature by 30 minute windows over the past 6 hours.

    Let’s compare this to what the aggregate would have looked like had we not interpolated the missing data, by adding a new series to the graph:

    1. time_bucket('30 minutes', "time") as time,
    2. AVG(temperature) AS sensor_1
    3. FROM sensor_data
    4. WHERE
    5. $__timeFilter("time") AND
    6. sensor_id = 1
    7. ORDER BY 1

    Grafana Screenshot: Aggregating across our interpolated data vs. missing data

    (Note that the interpolated average is now in ORANGE, while the average with missing data is GREEN.)

    As you can see above, the GREEN plot is missing a data point at 17:30, giving us little understanding of what happened during that time period, and risking breaking applications downstream. In contrast, the ORANGE plot uses our interpolated data to create a datapoint for that time period.

    Next steps

    This is just one way to use TimescaleDB with Grafana to solve data problems and ensure that your applications, systems, and operations don’t suffer any negative consequences (e.g., downtime, misbehaving applications, or a degregraded customer experience). For more ways on how to use TimescaleDB, check out our other (which range from beginner to advanced).