Part 2 - World's Simplest SQL Compiler and Virtual Machine

    We’re making a clone of sqlite. The “front-end” of sqlite is a SQL compiler that parses a string and outputs an internal representation called bytecode.

    This bytecode is passed to the virtual machine, which executes it.

    |

    Breaking things into two steps like this has a couple advantages:

    • Reduces the complexity of each part (e.g. virtual machine does not worry about syntax errors)
    • Allows compiling common queries once and caching the bytecode for improved performance
      With this in mind, let’s refactor our function and support two new keywords in the process:

    Non-SQL statements like .exit are called “meta-commands”. They all start with a dot, so we check for them and handle them in a separate function.

    Lastly, we pass the prepared statement to execute_statement. This function will eventually become our virtual machine.

    Notice that two of our new functions return enums indicating success or failure:

    1. enum MetaCommandResult_t {
    2. META_COMMAND_SUCCESS,
    3. META_COMMAND_UNRECOGNIZED_COMMAND
    4. };
    5. typedef enum MetaCommandResult_t MetaCommandResult;
    6. enum PrepareResult_t { PREPARE_SUCCESS, PREPARE_UNRECOGNIZED_STATEMENT };
    7. typedef enum PrepareResult_t PrepareResult;

    “Unrecognized statement”? That seems a bit like an exception. But (and C doesn’t even support them), so I’m using enum result codes wherever practical. The C compiler will complain if my switch statement doesn’t handle a member of the enum, so we can feel a little more confident we handle every result of a function. Expect more result codes to be added in the future.

    do_meta_command is just a wrapper for existing functionality that leaves room for more commands:

    Our “prepared statement” right now just contains an enum with two possible values. It will contain more data as we allow parameters in statements:

    1. enum StatementType_t { STATEMENT_INSERT, STATEMENT_SELECT };
    2. typedef enum StatementType_t StatementType;
    3. struct Statement_t {
    4. StatementType type;
    5. };
    6. typedef struct Statement_t Statement;

    prepare_statement (our “SQL Compiler”) does not understand SQL right now. In fact, it only understands two words:

    Lastly, execute_statement contains a few stubs:

    1. void execute_statement(Statement* statement) {
    2. switch (statement->type) {
    3. case (STATEMENT_INSERT):
    4. printf("This is where we would do an insert.\n");
    5. break;
    6. case (STATEMENT_SELECT):
    7. printf("This is where we would do a select.\n");
    8. break;
    9. }
    10. }

    Note that it doesn’t return any error codes because there’s nothing that could go wrong yet.

    With these refactors, we now recognize two new keywords!

    The skeleton of our database is taking shape… wouldn’t it be nice if it stored data? In the next part, we’ll implement insert and select, creating the world’s worst data store. In the mean time, here’s the entire diff from this part:

    1. @@ -10,6 +10,23 @@ struct InputBuffer_t {
    2. };
    3. typedef struct InputBuffer_t InputBuffer;
    4. +enum MetaCommandResult_t {
    5. + META_COMMAND_SUCCESS,
    6. + META_COMMAND_UNRECOGNIZED_COMMAND
    7. +};
    8. +
    9. +enum PrepareResult_t { PREPARE_SUCCESS, PREPARE_UNRECOGNIZED_STATEMENT };
    10. +typedef enum PrepareResult_t PrepareResult;
    11. +
    12. +typedef enum StatementType_t StatementType;
    13. +
    14. +struct Statement_t {
    15. + StatementType type;
    16. +};
    17. +typedef struct Statement_t Statement;
    18. +
    19. InputBuffer* new_input_buffer() {
    20. InputBuffer* input_buffer = malloc(sizeof(InputBuffer));
    21. input_buffer->buffer = NULL;
    22. @@ -35,16 +52,66 @@ void read_input(InputBuffer* input_buffer) {
    23. input_buffer->buffer[bytes_read - 1] = 0;
    24. }
    25. +MetaCommandResult do_meta_command(InputBuffer* input_buffer) {
    26. + if (strcmp(input_buffer->buffer, ".exit") == 0) {
    27. + exit(EXIT_SUCCESS);
    28. + } else {
    29. + return META_COMMAND_UNRECOGNIZED_COMMAND;
    30. + }
    31. +}
    32. +
    33. +PrepareResult prepare_statement(InputBuffer* input_buffer,
    34. + Statement* statement) {
    35. + if (strncmp(input_buffer->buffer, "insert", 6) == 0) {
    36. + statement->type = STATEMENT_INSERT;
    37. + return PREPARE_SUCCESS;
    38. + }
    39. + if (strcmp(input_buffer->buffer, "select") == 0) {
    40. + statement->type = STATEMENT_SELECT;
    41. + return PREPARE_SUCCESS;
    42. + }
    43. +
    44. + return PREPARE_UNRECOGNIZED_STATEMENT;
    45. +}
    46. +
    47. +void execute_statement(Statement* statement) {
    48. + case (STATEMENT_INSERT):
    49. + printf("This is where we would do an insert.\n");
    50. + case (STATEMENT_SELECT):
    51. + printf("This is where we would do a select.\n");
    52. + break;
    53. + }
    54. +}
    55. +
    56. int main(int argc, char* argv[]) {
    57. InputBuffer* input_buffer = new_input_buffer();
    58. while (true) {
    59. print_prompt();
    60. read_input(input_buffer);
    61. - if (strcmp(input_buffer->buffer, ".exit") == 0) {
    62. - exit(EXIT_SUCCESS);
    63. - } else {
    64. - printf("Unrecognized command '%s'.\n", input_buffer->buffer);
    65. + if (input_buffer->buffer[0] == '.') {
    66. + switch (do_meta_command(input_buffer)) {
    67. + case (META_COMMAND_SUCCESS):
    68. + continue;
    69. + case (META_COMMAND_UNRECOGNIZED_COMMAND):
    70. + printf("Unrecognized command '%s'\n", input_buffer->buffer);
    71. + continue;
    72. + }
    73. }
    74. +
    75. + Statement statement;
    76. + switch (prepare_statement(input_buffer, &statement)) {
    77. + case (PREPARE_SUCCESS):
    78. + break;
    79. + case (PREPARE_UNRECOGNIZED_STATEMENT):
    80. + printf("Unrecognized keyword at start of '%s'.\n",
    81. + input_buffer->buffer);
    82. + continue;
    83. + }
    84. +
    85. + execute_statement(&statement);
    86. + printf("Executed.\n");
    87. }

    Part 1 - Introduction and Setting up the REPL