Implementation Details - RocksDB Repairer - 《RocksDB v6.14 Documentation》

- Programmatic
Repair Process

Note the CLI command uses default options for repairing your DB and only adds the column families found in the SST files. If you need to specify any options, e.g., custom comparator, have column family-specific options, or want to specify the exact set of column families, you should choose the programmatic way.

For programmatic usage, call one of the functions declared in include/rocksdb/db.h.

CLI

For CLI usage, first build , our admin CLI tool:

Looks successful. MANIFEST file is back and DB is readable:

Notice the directory. It holds files containing data that was potentially lost during recovery.

Repair process is broken into 4 phase:

Find files
Convert logs to tables
Extract metadata
Write Descriptor

Find files

Convert logs to table

Every log file that is active is replayed. All sections of the file where the checksum does not match is skipped over. We intentionally give preference to data consistency.

Extract metadata

We scan every table to compute

smallest/largest for the table
largest sequence number in the table

If we are unable to scan the file, then we ignore the table.

Write Descriptor

log number is set to zero
last-sequence-number is set to largest sequence# found across all tables
compaction pointers are cleared
every table file is added at level 0

Possible optimizations

Compute total size and use to pick appropriate max-level M
Sort tables by largest sequence# in the table
For each table: if it overlaps earlier table, place in level-0, else place in level-M.
We can provide options for time consistent recovery and unsafe recovery (ignore checksum failure when applicable)
Store per-table metadata (smallest, largest, largest-seq#, …) in the table’s meta section to speed up ScanTable.

If the column family is created recently and not persisted in sst files by a flush, then it will be dropped during the repair process. With this limitation repair would might even damage a healthy db if its column families are not flushed yet.