SummingMergeTree
We recommend to use the engine together with MergeTree
. Store complete data in MergeTree
table, and use SummingMergeTree
for aggregated data storing, for example, when preparing reports. Such an approach will prevent you from losing valuable data due to an incorrectly composed primary key.
For a description of request parameters, see request description.
Parameters of SummingMergeTree
Query clauses
When creating a SummingMergeTree
table the same are required, as when creating a MergeTree
table.
Deprecated Method for Creating a Table
Do not use this method in new projects and, if possible, switch the old projects to the method described above.
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
) ENGINE [=] SummingMergeTree(date-column [, sampling_expression], (primary, key), index_granularity, [columns])
All of the parameters excepting columns
have the same meaning as in MergeTree
.
Consider the following table:
Insert data to it:
INSERT INTO summtt Values(1,1),(1,2),(2,1)
ClickHouse may sum all the rows not completely (see below), so we use an aggregate function sum
and GROUP BY
clause in the query.
┌─key─┬─sum(value)─┐
│ 2 │ 1 │
│ 1 │ 3 │
└─────┴────────────┘
When data are inserted into a table, they are saved as-is. Clickhouse merges the inserted parts of data periodically and this is when rows with the same primary key are summed and replaced with one for each resulting part of data.
ClickHouse can merge the data parts so that different resulting parts of data cat consist rows with the same primary key, i.e. the summation will be incomplete. Therefore () an aggregate function and GROUP BY
clause should be used in a query as described in the example above.
The values in the columns with the numeric data type are summarized. The set of columns is defined by the parameter columns
.
If column is not in the primary key and is not summarized, an arbitrary value is selected from the existing ones.
The values are not summarized for columns in the primary key.
For columns of AggregateFunction type ClickHouse behaves as engine aggregating according to the function.
Table can have nested data structures that are processed in a special way.
If the name of a nested table ends with Map
and it contains at least two columns that meet the following criteria:
- the first column is numeric
(*Int*, Date, DateTime)
or a string(String, FixedString)
, let’s call itkey
,
then this nested table is interpreted as a mapping of key => (values...)
, and when merging its rows, the elements of two data sets are merged by key
with a summation of the corresponding (values...)
.
Examples:
When requesting data, use the sumMap(key, value) function for aggregation of .