Tutorial: How to simulate a basic IoT sensor dataset on PostgreSQL or TimescaleDB

    Introduction

    In other words, IoT is all about the data. And the datasets generated by these things are generally time-series in nature, with relational metadata to describe those things.

    Often, it is necessary to simulate IoT sensor data, for example, when testing a new system. In this tutorial, we show how to simulate a basic IoT sensor dataset on PostgreSQL, or TimescaleDB.

    Note: For creating a more advanced simulated IoT dataset, please try the Time-series Benchmarking Suite (TSBS) (Github).

    To complete this tutorial, you will need a cursory knowledge of the Structured Query Language (SQL). The tutorial will walk you through each SQL command, but it will be helpful if you’ve seen SQL before.

    To start, . Once your installation is complete, we can proceed to ingesting or creating sample data and finishing the tutorial.

    Step 1 - Set up your tables

    First, connect to your database via .

    Second, create the “sensors” and “sensor_data” tables:

    1. CREATE TABLE sensor_data (
    2. time TIMESTAMPTZ NOT NULL,
    3. sensor_id INTEGER,
    4. temperature DOUBLE PRECISION,
    5. cpu DOUBLE PRECISION,
    6. FOREIGN KEY (sensor_id) REFERENCES sensors (id)
    7. );

    Third (only if you are using TimescaleDB), convert the sensor_data table into a hypertable:

    1. SELECT create_hypertable('sensor_data', 'time');
    1. INSERT INTO sensors (type, location) VALUES
    2. ('a','floor'),
    3. ('a', 'ceiling'),
    4. ('b','floor'),
    5. ('b', 'ceiling');

    Fifth, verify that the sensors were created correctly:

    1. SELECT * FROM sensors;

    After running that last SQL statement, you should see something like this:

    Note: for the following sections we’ll share the results of our queries as an example, but since the tutorial generates random data every time it is run, your results will look different (but will be structured the same way).

    First, generate a dataset for all of our four sensors and insert into our sensor_data table:

    1. SELECT
    2. time,
    3. sensor_id,
    4. random() AS cpu,
    5. random()*100 AS temperature
    6. FROM generate_series(now() - interval '24 hour', now(), interval '5 minute') AS g1(time), generate_series(1,4,1) AS g2(sensor_id);

    Second, verify that the simulated sensor data was written correctly:

    1. SELECT * FROM sensor_data ORDER BY time;

    SAMPLE OUTPUT:

    1. time | sensor_id | temperature | cpu
    2. 2020-03-31 15:56:25.843575+00 | 1 | 6.86688972637057 | 0.682070567272604
    3. 2020-03-31 15:56:40.244287+00 | 2 | 26.589260622859 | 0.229583469685167
    4. 2030-03-31 15:56:45.653115+00 | 3 | 79.9925176426768 | 0.457779890391976
    5. 2020-03-31 15:56:53.560205+00 | 4 | 24.3201029952615 | 0.641885648947209
    6. 2020-03-31 16:01:25.843575+00 | 1 | 33.3203678019345 | 0.0159163917414844
    7. 2020-03-31 16:01:40.244287+00 | 2 | 31.2673618085682 | 0.701185956597328
    8. 2020-03-31 16:01:45.653115+00 | 3 | 85.2960689924657 | 0.693413889966905
    9. 2020-03-31 16:01:53.560205+00 | 4 | 79.4769988860935 | 0.360561791341752
    10. ...

    Congrats! We’ve created a basic IoT sensor dataset. Now let’s run some queries.

    Step 3 - Run basic queries (optional)

    Note: This section requires TimescaleDB

    Average temperature, average cpu by 30 minute windows:

    1. SELECT
    2. time_bucket('30 minutes', time) AS period,
    3. AVG(temperature) AS avg_temp,
    4. AVG(cpu) AS avg_cpu
    5. FROM sensor_data
    6. GROUP BY period;

    Average & last temperature, average cpu by 30 minute windows:

    But what if we don’t just want the average temperature for each period, but also the last temperature? (For example if we wanted to understand the final temperature value at the end of the interval.)

    1. SELECT
    2. time_bucket('30 minutes', time) AS period,
    3. AVG(temperature) AS avg_temp,
    4. last(temperature, time) AS last_temp,
    5. AVG(cpu) AS avg_cpu
    6. FROM sensor_data

    SAMPLE OUTPUT:

    1. period | avg_temp | last_temp | avg_cpu
    2. ------------------------+------------------+------------------+-------------------
    3. 2020-03-31 19:00:00+00 | 49.6615830013373 | 84.3963081017137 | 0.477344429974134
    4. 2020-03-31 16:00:00+00 | 50.4250325243144 | 43.5192013625056 | 0.511075591299838
    5. 2020-03-31 17:30:00+00 | 49.0742547437549 | 22.740753274411 | 0.527267253802468
    6. 2020-04-01 14:30:00+00 | 49.3416377226822 | 59.1331578791142 | 0.438027751864865
    7. ...

    Using the sensor metadata

    Now let’s take advantage of some of the metadata we have stored in the sensors table:

    1. SELECT
    2. sensors.location,
    3. time_bucket('30 minutes', time) AS period,
    4. AVG(temperature) AS avg_temp,
    5. last(temperature, time) AS last_temp,
    6. AVG(cpu) AS avg_cpu
    7. FROM sensor_data JOIN sensors on sensor_data.sensor_id = sensors.id
    8. GROUP BY period, sensors.location;

    SAMPLE OUTPUT:

    1. location | period | avg_temp | last_temp | avg_cpu
    2. ----------+------------------------+------------------+-------------------+-------------------
    3. ceiling | 20120-03-31 15:30:00+00 | 25.4546818090603 | 24.3201029952615 | 0.435734559316188
    4. floor | 2020-03-31 15:30:00+00 | 43.4297036845237 | 79.9925176426768 | 0.56992522883229
    5. ceiling | 2020-03-31 16:00:00+00 | 53.8454438598516 | 43.5192013625056 | 0.490728285357666
    6. floor | 2020-03-31 16:00:00+00 | 47.0046211887772 | 23.0230117216706 | 0.53142289724201
    7. ceiling | 2020-03-31 16:30:00+00 | 58.7817596504465 | 63.6621567420661 | 0.488188337767497
    8. floor | 2020-03-31 16:30:00+00 | 44.611586847653 | 2.21919436007738 | 0.434762630766879
    9. ceiling | 2020-03-31 17:00:00+00 | 35.7026890735142 | 42.9420990403742 | 0.550129583687522
    10. floor | 2020-03-31 17:00:00+00 | 62.2794370166957 | 52.6636955793947 | 0.454323202022351
    11. ...

    Congratulations! You now have a basic IoT sensor dataset you can use for testing in PostgreSQL or TimescaleDB.

    To learn more about TimescaleDB, or about the TimescaleDB concepts and functions we just used, please visit these pages in our developer documentation: