Part 4 - Our First Tests (and Bugs)

    We’ve got the ability to insert rows into our database and to print out all rows. Let’s take a moment to test what we’ve got so far.

    I’m going to use rspec to write my tests because I’m familiar with it, and the syntax is fairly readable.

    I’ll define a short helper to send a list of commands to our database program then make assertions about the output:

    This simple test makes sure we get back what we put in. And indeed it passes:

    1. .
    2. Finished in 0.00871 seconds (files took 0.09506 seconds to load)
    3. 1 example, 0 failures

    Now it’s feasible to test inserting a large number of rows into the database:

    1. it 'prints error message when table is full' do
    2. script = (1..1401).map do |i|
    3. "insert #{i} user#{i} person#{i}@example.com"
    4. end
    5. script << ".exit"
    6. result = run_script(script)
    7. expect(result[-2]).to eq('db > Error: Table full.')
    8. end

    Running tests again…

    1. bundle exec rspec
    2. ..
    3. Finished in 0.01553 seconds (files took 0.08156 seconds to load)
    4. 2 examples, 0 failures

    Sweet, it works! Our db can hold 1400 rows right now because we set the maximum number of pages to 100, and 14 rows can fit in a page.

    Reading through the code we have so far, I realized we might not handle storing text fields correctly. Easy to test with this example:

    1. it 'allows inserting strings that are the maximum length' do
    2. long_username = "a"*32
    3. long_email = "a"*255
    4. script = [
    5. "insert 1 #{long_username} #{long_email}",
    6. "select",
    7. ".exit",
    8. ]
    9. result = run_script(script)
    10. expect(result).to match_array([
    11. "db > Executed.",
    12. "db > (1, #{long_username}, #{long_email})",
    13. "Executed.",
    14. "db > ",
    15. ])
    16. end
    1. Failures:
    2. 1) database allows inserting strings that are the maximum length
    3. Failure/Error: raw_output.split("\n")
    4. ArgumentError:
    5. invalid byte sequence in UTF-8
    6. # ./spec/main_spec.rb:14:in `split'
    7. # ./spec/main_spec.rb:14:in `run_script'
    8. # ./spec/main_spec.rb:48:in `block (2 levels) in <top (required)>'

    If we try it ourselves, we’ll see that there’s some weird characters when we try to print out the row. (I’m abbreviating the long strings):

    What’s going on? If you take a look at our definition of a Row, we allocate exactly 32 bytes for username and exactly 255 bytes for email. But are supposed to end with a null character, which we didn’t allocate space for. The solution is to allocate one additional byte:

    1. const uint32_t COLUMN_EMAIL_SIZE = 255;
    2. struct Row_t {
    3. uint32_t id;
    4. - char username[COLUMN_USERNAME_SIZE];
    5. - char email[COLUMN_EMAIL_SIZE];
    6. + char username[COLUMN_USERNAME_SIZE + 1];
    7. + char email[COLUMN_EMAIL_SIZE + 1];
    8. };
    9. typedef struct Row_t Row;

    And indeed that fixes it:

    1. bundle exec rspec
    2. ...
    3. Finished in 0.0188 seconds (files took 0.08516 seconds to load)
    4. 3 examples, 0 failures

    We should not allow inserting usernames or emails that are longer than column size. The spec for that looks like this:

    1. it 'prints error message if strings are too long' do
    2. long_username = "a"*33
    3. long_email = "a"*256
    4. script = [
    5. "insert 1 #{long_username} #{long_email}",
    6. "select",
    7. ".exit",
    8. ]
    9. result = run_script(script)
    10. expect(result).to match_array([
    11. "db > String is too long.",
    12. "db > Executed.",
    13. "db > ",
    14. ])
    15. end

    In order to do this we need to upgrade our parser. As a reminder, we’re currently using scanf():

    1. if (strncmp(input_buffer->buffer, "insert", 6) == 0) {
    2. statement->type = STATEMENT_INSERT;
    3. int args_assigned = sscanf(
    4. input_buffer->buffer, "insert %d %s %s", &(statement->row_to_insert.id),
    5. statement->row_to_insert.username, statement->row_to_insert.email);
    6. if (args_assigned < 3) {
    7. return PREPARE_SYNTAX_ERROR;
    8. }
    9. return PREPARE_SUCCESS;
    10. }

    But . If the string it’s reading is larger than the buffer it’s reading into, it will cause a buffer overflow and start writing into unexpected places. We want to check the length of each string before we copy it into a Row structure. And to do that, we need to divide the input by spaces.

    I’m going to use strtok() to do that. I think it’s easiest to understand if you see it in action:

    1. +PrepareResult prepare_insert(InputBuffer* input_buffer, Statement* statement) {
    2. + char* keyword = strtok(input_buffer->buffer, " ");
    3. + char* id_string = strtok(NULL, " ");
    4. + char* username = strtok(NULL, " ");
    5. + char* email = strtok(NULL, " ");
    6. +
    7. + if (id_string == NULL || username == NULL || email == NULL) {
    8. + return PREPARE_SYNTAX_ERROR;
    9. + }
    10. +
    11. + int id = atoi(id_string);
    12. + if (strlen(username) > COLUMN_USERNAME_SIZE) {
    13. + return PREPARE_STRING_TOO_LONG;
    14. + }
    15. + if (strlen(email) > COLUMN_EMAIL_SIZE) {
    16. + return PREPARE_STRING_TOO_LONG;
    17. + }
    18. +
    19. + statement->row_to_insert.id = id;
    20. + strcpy(statement->row_to_insert.username, username);
    21. + strcpy(statement->row_to_insert.email, email);
    22. +
    23. + return PREPARE_SUCCESS;
    24. +}
    25. +
    26. PrepareResult prepare_statement(InputBuffer* input_buffer,
    27. Statement* statement) {
    28. if (strncmp(input_buffer->buffer, "insert", 6) == 0) {
    29. + return prepare_insert(input_buffer, statement);
    30. - statement->type = STATEMENT_INSERT;
    31. - int args_assigned = sscanf(
    32. - input_buffer->buffer, "insert %d %s %s", &(statement->row_to_insert.id),
    33. - statement->row_to_insert.username, statement->row_to_insert.email);
    34. - if (args_assigned < 3) {
    35. - return PREPARE_SYNTAX_ERROR;
    36. - }
    37. - return PREPARE_SUCCESS;
    38. }

    Calling strtok successively on the the input buffer breaks it into substrings by inserting a null character whenever it reaches a delimiter (space, in our case). It returns a pointer to the start of the substring.

    We can call on each text value to see if it’s too long.

    1. switch (prepare_statement(input_buffer, &statement)) {
    2. case (PREPARE_SUCCESS):
    3. break;
    4. + case (PREPARE_STRING_TOO_LONG):
    5. + printf("String is too long.\n");
    6. + continue;
    7. case (PREPARE_SYNTAX_ERROR):
    8. printf("Syntax error. Could not parse statement.\n");
    9. continue;

    Which makes our test pass

    1. bundle exec rspec
    2. ....
    3. Finished in 0.02284 seconds (files took 0.116 seconds to load)
    4. 4 examples, 0 failures

    While we’re here, we might as well handle one more error case:

    1. it 'prints an error message if id is negative' do
    2. script = [
    3. "insert -1 cstack foo@bar.com",
    4. "select",
    5. ".exit",
    6. ]
    7. result = run_script(script)
    8. expect(result).to match_array([
    9. "db > ID must be positive.",
    10. "db > Executed.",
    11. "db > ",
    12. ])
    13. end
    1. enum PrepareResult_t {
    2. PREPARE_SUCCESS,
    3. + PREPARE_NEGATIVE_ID,
    4. PREPARE_STRING_TOO_LONG,
    5. PREPARE_SYNTAX_ERROR,
    6. PREPARE_UNRECOGNIZED_STATEMENT
    7. @@ -148,9 +147,6 @@ PrepareResult prepare_insert(InputBuffer* input_buffer, Statement* statement) {
    8. }
    9. int id = atoi(id_string);
    10. + if (id < 0) {
    11. + return PREPARE_NEGATIVE_ID;
    12. + }
    13. if (strlen(username) > COLUMN_USERNAME_SIZE) {
    14. return PREPARE_STRING_TOO_LONG;
    15. }
    16. @@ -230,9 +226,6 @@ int main(int argc, char* argv[]) {
    17. switch (prepare_statement(input_buffer, &statement)) {
    18. case (PREPARE_SUCCESS):
    19. + printf("ID must be positive.\n");
    20. + continue;
    21. case (PREPARE_STRING_TOO_LONG):
    22. printf("String is too long.\n");
    23. continue;

    Alright, that’s enough testing for now. Next is a very important feature: persistence! We’re going to save our database to a file and read it back out again.

    It’s gonna be great.

    Here’s the complete diff for this part:

    1. enum PrepareResult_t {
    2. PREPARE_SUCCESS,
    3. + PREPARE_NEGATIVE_ID,
    4. + PREPARE_STRING_TOO_LONG,
    5. PREPARE_SYNTAX_ERROR,
    6. PREPARE_UNRECOGNIZED_STATEMENT
    7. };
    8. @@ -33,8 +35,8 @@ const uint32_t COLUMN_USERNAME_SIZE = 32;
    9. const uint32_t COLUMN_EMAIL_SIZE = 255;
    10. struct Row_t {
    11. uint32_t id;
    12. - char username[COLUMN_USERNAME_SIZE];
    13. - char email[COLUMN_EMAIL_SIZE];
    14. + char username[COLUMN_USERNAME_SIZE + 1];
    15. + char email[COLUMN_EMAIL_SIZE + 1];
    16. };
    17. typedef struct Row_t Row;
    18. @@ -133,17 +135,40 @@ MetaCommandResult do_meta_command(InputBuffer* input_buffer) {
    19. }
    20. }
    21. +PrepareResult prepare_insert(InputBuffer* input_buffer, Statement* statement) {
    22. + statement->type = STATEMENT_INSERT;
    23. +
    24. + char* keyword = strtok(input_buffer->buffer, " ");
    25. + char* id_string = strtok(NULL, " ");
    26. + char* username = strtok(NULL, " ");
    27. + char* email = strtok(NULL, " ");
    28. +
    29. + if (id_string == NULL || username == NULL || email == NULL) {
    30. + return PREPARE_SYNTAX_ERROR;
    31. + }
    32. +
    33. + int id = atoi(id_string);
    34. + if (id < 0) {
    35. + return PREPARE_NEGATIVE_ID;
    36. + }
    37. + if (strlen(username) > COLUMN_USERNAME_SIZE) {
    38. + return PREPARE_STRING_TOO_LONG;
    39. + }
    40. + if (strlen(email) > COLUMN_EMAIL_SIZE) {
    41. + return PREPARE_STRING_TOO_LONG;
    42. + }
    43. +
    44. + statement->row_to_insert.id = id;
    45. + strcpy(statement->row_to_insert.username, username);
    46. + strcpy(statement->row_to_insert.email, email);
    47. +
    48. + return PREPARE_SUCCESS;
    49. +}
    50. +
    51. PrepareResult prepare_statement(InputBuffer* input_buffer,
    52. Statement* statement) {
    53. if (strncmp(input_buffer->buffer, "insert", 6) == 0) {
    54. - statement->type = STATEMENT_INSERT;
    55. - int args_assigned = sscanf(
    56. - input_buffer->buffer, "insert %d %s %s", &(statement->row_to_insert.id),
    57. - statement->row_to_insert.username, statement->row_to_insert.email);
    58. - if (args_assigned < 3) {
    59. - return PREPARE_SYNTAX_ERROR;
    60. - }
    61. - return PREPARE_SUCCESS;
    62. + return prepare_insert(input_buffer, statement);
    63. }
    64. if (strcmp(input_buffer->buffer, "select") == 0) {
    65. statement->type = STATEMENT_SELECT;
    66. @@ -205,6 +230,12 @@ int main(int argc, char* argv[]) {
    67. switch (prepare_statement(input_buffer, &statement)) {
    68. case (PREPARE_SUCCESS):
    69. break;
    70. + case (PREPARE_NEGATIVE_ID):
    71. + printf("ID must be positive.\n");
    72. + continue;
    73. + case (PREPARE_STRING_TOO_LONG):
    74. + printf("String is too long.\n");
    75. + continue;
    76. case (PREPARE_SYNTAX_ERROR):
    77. continue;

    And we added tests:

    Part 3 - An In-Memory, Append-Only, Single-Table Database