Batch operations

    To insert data in a batch, prepared statements with the bind values are added to the write batch. This is done in order to reduce repeated statement parsing overhead.

    In order to perform a batch insert operation in Java, first create a object. Next add the desired number of prepared and bound insert statements to it. Finally, execute the batch object. This is shown below.

    Range queries are very efficient as the database keeps the data together on disk. Range queries can only be performed on clustering columns of the primary key. In order to perform range queries, the table should have been created with clustering columns. These use-cases generally need have a sort order for some of the primary key columns.

    Consider a table which has a hash column h and two clustering columns r1 and r2. The following range queries are efficient.

    • Query a range of values for r1 given h.
    1. > SELECT * FROM table WHERE h = '...' AND r1 < '<upper-bound>' AND r1 > '<lower-bound>';
    1. > SELECT * FROM table WHERE h = '...' AND r1 = '...' AND r2 < '<upper-bound>' AND r2 > '<lower-bound>';
    • Query a range of values for r2 given h - may not be efficient. This query will need to iterate through all the unique values of r1 in order to fetch the result and would be less efficient if a key has a lot of values for the r1 column.
    • Query a range of values for r1 without being specified - may not be efficient. This query will perform a full scan of the table and would be less efficient if the table is large.
    1. > SELECT * FROM table WHERE r1 < '<upper-bound>' AND r1 > '<lower-bound>';

    Consider a table which has a hash column h and a clustering column r.

    • Query a set of values of h - this operation will perform the lookups for the various hash keys and return the response. The read queries are batched at a tablet level and executed in parallel. This query will be more efficient than performing each lookup from the application.
    1. > SELECT * FROM table WHERE h IN ('<value1>', '<value2>', ...);
    • Query a set of values for r given one value of h - this query is efficient and will seek to the various values of r for the given value of h.
    • Query a set of values for h and a set of values for r. This query will do point lookups for each combination of the provided h and values. For example, if the query specifies 3 values for h and 2 values for r, there will be 6 lookups performed internally and the result set could have upto 6 rows.
    1. > SELECT * FROM table WHERE h IN ('<value1>', '<value2>', ...) AND r IN ('<value1>', '<value2>', ...);

    You can find a working example of using transactions with YugabyteDB in our . This application writes batched key-value pairs with a configurable number of keys per batch. There are multiple readers and writers running in parallel performing these batch writes.

    1. Usage:
    2. --workload CassandraBatchKeyValue \
    3. --nodes 127.0.0.1:9042
    4. Other options (with default values):
    5. [ --num_unique_keys 1000000 ]
    6. [ --num_reads -1 ]
    7. [ --num_writes -1 ]
    8. [ --value_size 0 ]
    9. [ --num_threads_read 24 ]
    10. [ --num_threads_write 2 ]
    11. [ --table_ttl_seconds -1 ]

    Browse the Java source code for the batch application to see how everything fits together.