QuestDB uses a text_loader.json configuration file which can be placed in the server’s conf directory. This file does not exist by default, but has the following implicit settings:

Given a CSV file which contains timestamps in the format yyyy-MM-dd - HH:mm:ss.SSSUUU, the following text loader configuration will provide the correct timestamp parsing:

  1. {
  2. "date": [
  3. {
  4. "format": "dd/MM/y"
  5. },
  6. {
  7. "format": "yyyy-MM-dd HH:mm:ss"
  8. },
  9. {
  10. "format": "yyyy-MM-ddTHH:mm:ss.SSSz",
  11. "locale": "en-US",
  12. "utf8": false
  13. {
  14. }
  15. ],
  16. "timestamp": [
  17. {
  18. "format": "yyyy-MM-ddTHH:mm:ss.SSSUUUz",
  19. "utf8": false
  20. },
  21. {
  22. "format": "yyyy-MM-dd - HH:mm:ss.SSSUUU",
  23. "utf8": false
  24. }
  25. ]
  26. }

The CSV data can then be loaded via POST request, for example, using cURL:

Large datasets with out-of-order data

Using the lag and batch size parameters during INSERT AS SELECT statements is a convenient strategy to load and order large datasets from CSV in bulk when they contain out-of-order data.

The batch size specifies how many records to attempt to bulk insert at one time and the lag allows for specifying the expected lateness of out-of-order timestamp values (in microseconds):

  1. SELECT * FROM unordered_table

Example

Given a large dataset with out-of-order records, an ordered table may be created using the following steps:

  1. Create a table with the schema of the imported data and apply a partitioning strategy. Records are not yet inserted due to the use of WHERE 1 != 1. The timestamp column may be cast as a timestamp if the import did not automatically detect the correct format:

    1. CREATE TABLE weather AS (
    2. SELECT
    3. cast(timestamp AS timestamp) timestamp,
    4. windDir,
    5. windSpeed,
    6. windGust,
    7. rain1H,
    8. rain6H,
    9. rain24H
    10. FROM 'weather-unordered.csv' WHERE 1 != 1
    11. ) timestamp(timestamp) PARTITION BY DAY;
  2. Insert the unordered records into the partitioned table and provide a lag and batch size:

To confirm that the table is ordered, the isOrdered() function may be used:

    isOrdered
    true