SummingMergeTree

    We recommend to use the engine together with MergeTree. Store complete data in MergeTree table, and use SummingMergeTree for aggregated data storing, for example, when preparing reports. Such an approach will prevent you from losing valuable data due to an incorrectly composed primary key.

    For a description of request parameters, see request description.

    Parameters of SummingMergeTree

    Query clauses

    When creating a SummingMergeTree table the same are required, as when creating a MergeTree table.

    Deprecated Method for Creating a Table

    Do not use this method in new projects and, if possible, switch the old projects to the method described above.

    1. CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
    2. (
    3. name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    4. name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    5. ) ENGINE [=] SummingMergeTree(date-column [, sampling_expression], (primary, key), index_granularity, [columns])

    All of the parameters excepting columns have the same meaning as in MergeTree.

    Consider the following table:

    Insert data to it:

    1. INSERT INTO summtt Values(1,1),(1,2),(2,1)

    ClickHouse may sum all the rows not completely (see below), so we use an aggregate function sum and GROUP BY clause in the query.

    1. ┌─key─┬─sum(value)─┐
    2. 2 1
    3. 1 3
    4. └─────┴────────────┘

    When data are inserted into a table, they are saved as-is. Clickhouse merges the inserted parts of data periodically and this is when rows with the same primary key are summed and replaced with one for each resulting part of data.

    ClickHouse can merge the data parts so that different resulting parts of data cat consist rows with the same primary key, i.e. the summation will be incomplete. Therefore () an aggregate function and GROUP BY clause should be used in a query as described in the example above.

    The values in the columns with the numeric data type are summarized. The set of columns is defined by the parameter columns.

    If column is not in the primary key and is not summarized, an arbitrary value is selected from the existing ones.

    The values are not summarized for columns in the primary key.

    For columns of AggregateFunction type ClickHouse behaves as engine aggregating according to the function.

    Table can have nested data structures that are processed in a special way.

    If the name of a nested table ends with Map and it contains at least two columns that meet the following criteria:

    • the first column is numeric (*Int*, Date, DateTime) or a string (String, FixedString), let’s call it key,

    then this nested table is interpreted as a mapping of key => (values...), and when merging its rows, the elements of two data sets are merged by key with a summation of the corresponding (values...).

    Examples:

    When requesting data, use the sumMap(key, value) function for aggregation of .