Sequences >>

    A sharded keyspace allows you to split a large database into smaller parts by distributing the rows of each table into different shards. In Vitess, each shard is assigned a . Every row has a keyspace id, and this value decides the shard in which the row lives. For key-value stores, the keyspace id is dictated by the value of the key, also known as the sharding key. In Vitess, this is known as the Primary Vindex. But it differs from a sharding key in the following ways:

    • The Vindex also decides the sharding function that controls how the data is distributed.
    • The sharding function is pluggable, allowing for user-defined sharding schemes.

    Vitess provides many predefined vindex types. The most popular ones are:

    • hash: for numbers
    • unicode_loose_md5: for text columns
    • binary_md5: for binary columns

    In our example, we are going to designate customer as a sharded keyspace, and create a customer table in it. The schema for the table is as follows:

    In the VSchema, we need to designate which column should be the Primary Vindex, and choose the vindex type for it. The customer_id column seems to be the natural choice. Since it is a number, we will choose hash as the vindex type:

    1. {
    2. "sharded": true,
    3. "vindexes": {
    4. "hash": {
    5. "type": "hash"
    6. }
    7. },
    8. "tables": {
    9. "customer": {
    10. "column_vindexes": [{
    11. "column": "customer_id",
    12. "name": "hash"
    13. }
    14. }
    15. }
    • As we will see later, vindexes can be instantiated with different input parameters. In such cases, they have to have their own distinct names.
    • Vindexes can be shared by tables, and this has special meaning. We will cover this in a later section.
    • Vindexes can also be referenced as if they were tables and can be used to compute the keyspace id for a given input.

    The column_vindexes section is a list. This is because a table can have multiple vindexes. If so, the first vindex in the list must be the Primary Vindex. More information about vindexes can be found in the .

    Alternate VSchema DDL:

    The DDL creates the hash vindex under the vindexes section, the customer table under the tables section, and associates the customer_id column to hash. For sharded keyspaces, the only way to create a table is using the above construct. This is because a primary vindex is mandatory for sharded tables.

    Every sharded table must have a Primary Vindex. A Primary Vindex must be instantiated from a vindex type that is Unique. hash, unicode_loose_md5 and binary_md5 are unique vindex types.

    The demo brings up the customer table as two shards: -80 and 80-. For a hash vindex, input values of 1, 2 and 3 fall in the -80 range, and 4 falls in the 80- range. Restarting the demo with the updated configs should allow you to perform the following:

    1. mysql> insert into customer(customer_id,uname) values(1,'alice'),(4,'dan');
    2. Query OK, 2 rows affected (0.00 sec)
    3. mysql> use `customer:-80`;
    4. Database changed
    5. +-------------+-------+
    6. | customer_id | uname |
    7. +-------------+-------+
    8. | 1 | alice |
    9. +-------------+-------+
    10. mysql> use `customer:80-`;
    11. Database changed
    12. mysql> select * from customer;
    13. +-------------+-------+
    14. | customer_id | uname |
    15. +-------------+-------+
    16. | 4 | dan |
    17. +-------------+-------+
    18. 1 row in set (0.00 sec)

    At the time of insert, the Primary Vindex is used to compute and assign a keyspace id to each row. This keyspace id gets used to decide where the row will be stored. Although a keyspace id is not explicitly stored anywhere, it must be seen as an unchanging property of that row; as if there was an invisible column for it.

    Consequently, you cannot make changes to a row that can cause the keyspace id to change. Such a change will be supported in the future through a shard move operation. Trying to change the value of a Primary Vindex results in an error:

    A Primary Vindex can also be used to find rows if referenced in a where clause:

    1. mysql> select * from customer where customer_id=1;
    2. +-------------+-------+
    3. | customer_id | uname |
    4. +-------------+-------+
    5. | 1 | alice |
    6. +-------------+-------+

    If you run the above query in the demo app, the panel on the bottom right will show that the query was executed only on one shard.

    On the other hand, the query below will get sent to all shards because there is no where clause:


    << Unsharded Keyspace