Part 9 - Binary Search and Duplicate Keys

    Last time we noted that we’re still storing keys in unsorted order. We’re going to fix that problem, plus detect and reject duplicate keys.

    Right now, our function always chooses to insert at the end of the table. Instead, we should search the table for the correct place to insert, then insert there. If the key already exists there, return an error.

    We don’t need the table_end() function anymore.

    1. -Cursor* table_end(Table* table) {
    2. - Cursor* cursor = malloc(sizeof(Cursor));
    3. - cursor->table = table;
    4. - cursor->page_num = table->root_page_num;
    5. -
    6. - void* root_node = get_page(table->pager, table->root_page_num);
    7. - uint32_t num_cells = *leaf_node_num_cells(root_node);
    8. - cursor->cell_num = num_cells;
    9. - cursor->end_of_table = true;
    10. -
    11. - return cursor;
    12. -}
    1. +/*
    2. +Return the position of the given key.
    3. +If the key is not present, return the position
    4. +where it should be inserted
    5. +*/
    6. +Cursor* table_find(Table* table, uint32_t key) {
    7. + uint32_t root_page_num = table->root_page_num;
    8. + void* root_node = get_page(table->pager, root_page_num);
    9. + return leaf_node_find(table, root_page_num, key);
    10. + } else {
    11. + printf("Need to implement searching an internal node\n");
    12. + exit(EXIT_FAILURE);
    13. + }
    14. +}

    I’m stubbing out the branch for internal nodes because we haven’t implemented internal nodes yet. We can search the leaf node with binary search.

    This will either return

    • the position of the key,
    • the position of another key that we’ll need to move if we want to insert the new key, or
    • the position one past the last key
      Since we’re now checking node type, we need functions to get and set that value in a node.
    1. +NodeType get_node_type(void* node) {
    2. + uint8_t value = *((uint8_t*)(node + NODE_TYPE_OFFSET));
    3. + return (NodeType)value;
    4. +}
    5. +
    6. +void set_node_type(void* node, NodeType type) {
    7. + uint8_t value = type;
    8. + *((uint8_t*)(node + NODE_TYPE_OFFSET)) = value;
    9. +}

    We have to cast to uint8_t first to ensure it’s serialized as a single byte.

    We also need to initialize node type.

    1. -void initialize_leaf_node(void* node) { *leaf_node_num_cells(node) = 0; }
    2. +void initialize_leaf_node(void* node) {
    3. + set_node_type(node, NODE_LEAF);
    1. case (EXECUTE_SUCCESS):
    2. printf("Executed.\n");
    3. break;
    4. + case (EXECUTE_DUPLICATE_KEY):
    5. + printf("Error: Duplicate key.\n");
    6. + break;
    7. case (EXECUTE_TABLE_FULL):
    8. printf("Error: Table full.\n");
    9. break;

    With these changes, our test can change to check for sorted order:

    1. "db > Executed.",
    2. "db > Tree:",
    3. "leaf (size 3)",
    4. - " - 0 : 3",
    5. - " - 1 : 1",
    6. - " - 2 : 2",
    7. + " - 0 : 1",
    8. + " - 1 : 2",
    9. + " - 2 : 3",
    10. "db > "
    11. ])

    And we can add a new test for duplicate keys:

    That’s it! Next up: implement splitting leaf nodes and creating internal nodes.

    Part 8 - B-Tree Leaf Node Format