Hypertable Basics
Hypertables in TimescaleDB are designed to be easy to manage and to behave predictably to users familiar with standard PostgreSQL tables. Along those lines, SQL commands to create, alter, or delete (hyper)tables in TimescaleDB are identical to those in PostgreSQL. Even though hypertables are comprised of many interlinked “chunk” tables, commands made to the hypertable automatically propagate changes down to all of the chunks belonging to that hypertable.
Creating a hypertable is a two-step process.
Create a standard table (PostgreSQL docs).
You can execute standard ALTER TABLE
commands against the hypertable ().
It’s just the standard command, where TimescaleDB will correspondingly delete all chunks belonging to the hypertable.
Users of TimescaleDB often have two common questions:
- How large should I configure my intervals for time partitioning?
- Should I use space partitioning, and how many space partitions should I use?
Time intervals: Users can use the default time interval, which is 7 days starting in v0.11.0 and 1 month prior to v0.11.0. Alternatively, users can explicitly configure time intervals by setting chunk_time_interval
when creating a hypertable. After the hypertable has been created, the interval used for new chunks can be changed by calling set_chunk_time_interval
.
The key property of choosing the time interval is that the chunk (including indexes) belonging to the most recent interval (or chunks if using space partitions) fit into memory. As such, we typically recommend setting the interval so that these chunk(s) comprise no more than 25% of main memory.
If you want to see the current interval length for your hypertables, you can check the as follows. Note that for time-based interval lenghts, these are reported in microseconds.
To determine this, you need to have a general idea of your data rate. If you are writing roughly 2GB of data per day and have 64GB of memory, setting the time interval to a week would be good. If you are writing 10GB per day on the same machine, setting the time interval to a day would be appropriate. This interval would also hold if data is loaded more in batches, e.g., you bulk load 70GB of data per week, with data corresponding to records from throughout the week.
Space partitions: Space partitioning is optional but can make sense for certain types of data and is recommended when using distributed hypertables.
Space partitions use hashing: Every distinct item is hashed to one of N buckets. Remember that we are already using (flexible) time intervals to manage chunk sizes; the main purpose of space partitioning is to enable parallel I/O to the same time interval.
Parallel I/O can benefit in two scenarios: (a) two or more concurrent queries should be able to read from different disks in parallel, or (b) a single query should be able to use query parallelization to read from multiple disks in parallel.
Thus, users looking for parallel I/O have two options:
For each physical disk, add a separate tablespace to the database. TimescaleDB allows you to actually add multiple tablespaces to a single hypertable (although under the covers, each underlying chunk will be mapped by TimescaleDB to a single tablespace / physical disk).
We recommend a RAID setup when possible, as it supports both forms of parallelization described above (i.e., separate queries to separate disks, single query to multiple disks in parallel). The multiple tablespace approach only supports the former. With a RAID setup, no spatial partitioning is required.
TimescaleDB does not benefit from a very large number of space partitions (such as the number of unique items you expect in partition field). A very large number of such partitions leads both to poorer per-partition load balancing (the mapping of items to partitions using hashing), as well as much increased planning latency for some types of queries.