High-concurrency point query based on primary key
Doris is built on a columnar storage format engine. In high-concurrency service scenarios, users always want to retrieve entire rows of data from the system. However, when tables are wide, the columnar format greatly amplifies random read IO. Doris query engine and planner are too heavy for some simple queries, such as point queries. A short path needs to be planned in the FE’s query plan to handle such queries. FE is the access layer service for SQL queries, written in Java. Parsing and analyzing SQL also leads to high CPU overhead for high-concurrency queries. To solve these problems, we have introduced row storage, short query path, and PreparedStatement in Doris. Below is a guide to enable these optimizations.
As we provided row store format , we could use such store format to speed up point query performance for merge-on-write model.For point query on primary keys when enable_unique_key_merge_on_write
enabled, planner will optimize such query and execute in a short path in a light weight RPC interface.Bellow is an example of point query with row store on merge-on-write model:
- should be enabled, since we need primary key for quick point lookup in storage engine
light_schema_change
should also been enabled since we rely of each columns when doing point query.
In order to reduce CPU cost for parsing query SQL and SQL expressions, we provide PreparedStatement
feature in FE fully compatible with mysql protocol (currently only support point queries like above mentioned).Enable it will pre caculate PreparedStatement SQL and expresions and caches it in a session level memory buffer and will be reused later on.We could improve 4x+ performance by using when CPU became hotspot doing such queries.Bellow is an JDBC example of using PreparedStatement
.
- Setup JDBC url and enable server side prepared statement