The most important options that affect memtable behavior are:

  • : The factory object of memtable. By specifying factory object user can change the underlying implementation of memtable, and provide implementation specific options.
  • write_buffer_size: Size of a single memtable.
  • db_write_buffer_size: Total size of memtables across column families. This can be used to manage the total memory used by memtables.
  • max_write_buffer_number: The maximum number of memtables build up in memory, before they flush to SST files.
  • max_write_buffer_size_to_maintain: The amount of write history to maintain in memory, in bytes. This includes the current memtable size, sealed but unflushed memtables, and flushed memtables that are kept around. RocksDB will try to keep at least this much history in memory - if dropping a flushed memtable would result in history falling below this threshold, it would not be dropped.

The default implementation of memtable is based on skiplist. Other than the default memtable implementation, users can use other types of memtable implementation, for example HashLinkList, HashSkipList or Vector, to speed-up some queries.

HashSkiplist MemTable

As their names imply, HashSkipList organizes data in a hash table with each hash bucket to be a skip list, while HashLinkList organizes data in a hash table with each hash bucket as a sorted single linked list. Both types are built to reduce number of comparisons when doing queries. One good use case is to combine them with PlainTable SST format and store data in RAMFS.

When doing a look-up or inserting a key, target key’s prefix is retrieved using Options.prefix_extractor, which is used to find the hash bucket. Inside a hash bucket, all the comparisons are done using whole (internal) keys, just as SkipList based memtable.

There are three scenarios where memtable flush can be triggered:

  1. Memtable size exceeds write_buffer_size after a write.
  2. Total memtable size across all column families exceeds , or write_buffer_manager signals a flush. In this scenario the largest memtable will be flushed.

As a result, a memtable can be flushed before it is full. This is one reason the generated SST file can be smaller than the corresponding memtable. Compression is another factor to make SST file smaller than corresponding memtable, since data in memtable is uncompressed.

Concurrent Insert

In-place Update