VACUUM
For example, in Hive Connector you can update or delete ORC transactional table row by row. But whenever run update, an new delta and delete_delta file will be generated in HDFS file system. Use can merge all those small files to a larger file, and optimize parallelism and performance
Types of VACUUMs:
Default
Default vacuum can be treated as a first level of merging small data sets of the table. These will be frequent and usually will be faster compared to FULL vacuum.
Hive:
FULL
FULL vacuum can be treated as the next level of merging of all data sets of table. These will be less frequent and takes longer time to complete compare to default vacuum.
Hive:
FULL Vacuum corresponds to ‘Major Compaction’ in Hive Connector. Merges all base and delta files together. As part of this operation, the deleted or updated rows are permanently removed. All the aborted transactions are removed from the transaction table in the metastore. The old delta files will be removed once all readers are finished reading them.
The keyword indicate whether to start a Major Compaction. Without this option, it will do a Minor compaction;
Use to identify this vacuum running as synchronous mode. Without this option, it will run as asynchronous mode.
Example 1: Default vacuum and wait for completion:
Example 2: FULL vacuum on partition ‘partition_key=p1’:
Example 3: FULL vacuum and wait for completion: