ReplacingMergeTree
Data deduplication occurs only during a merge. Merging occurs in the background at an unknown time, so you can’t plan for it. Some of the data may remain unprocessed. Although you can run an unscheduled merge using the OPTIMIZE
query, do not count on using it, because the OPTIMIZE
query will read and write a large amount of data.
Thus, ReplacingMergeTree
is suitable for clearing out duplicate data in the background in order to save space, but it does not guarantee the absence of duplicates.
For a description of request parameters, see statement description.
Attention
ReplacingMergeTree Parameters
ver
— column with the version number. Type ,Date
,DateTime
orDateTime64
. Optional parameter.When merging,
ReplacingMergeTree
from all the rows with the same sorting key leaves only one:- The last in the selection, if
ver
not set. A selection is a set of rows in a set of parts participating in the merge. The most recently created part (the last insert) will be the last one in the selection. Thus, after deduplication, the very last row from the most recent insert will remain for each unique sorting key.
- The last in the selection, if
Query clauses
Deprecated Method for Creating a Table
Attention
Do not use this method in new projects and, if possible, switch the old projects to the method described above.
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
...
) ENGINE [=] ReplacingMergeTree(date-column [, sampling_expression], (primary, key), index_granularity, [ver])
All of the parameters excepting ver
have the same meaning as in .
ver
- column with the version. Optional parameter. For a description, see the text above.