优化器对于子查询一般使用嵌套执行的方式,也就是父查询每生成一行数据后,都需要执行一次子查询。使用这种方式需要多次执行子查询,执行效率很低。对于子查询的优化方式,一般会将其改写为联接操作,可大大提高执行效率,主要优点如下:

  • 可避免子查询多次执行。

  • 优化器可根据统计信息选择更优的联接顺序和联接方法。

  • 子查询的联接条件、过滤条件改写为父查询的条件后,优化器可以进行进一步优化,比如条件下压等。

子查询改写的方式主要包括视图合并、子查询展开和将 ANY/ALL 使用 MAX/MIN 改写等。

视图合并

视图合并是指将代表一个视图的子查询合并到包含该视图的查询中,视图合并后,有助于优化器增加联接顺序的选择、访问路径的选择以及进一步做其他改写操作,从而选择更优的执行计划。

OceanBase 数据库支持对 SPJ 视图进行合并。如下示例为 Q1 改写为 Q2:

如果 Q1 不进行改写,则其联接顺序有以下几种:

  • t1, v(t2,t3)

  • t1, v(t3,t2)

  • v(t2,t3), t1

  • v(t3,t2), t1

进行视图合并改写后,可选择的联接顺序有:

  • t1, t2, t3

  • t1, t3, t2

  • t2, t1, t3

  • t2, t3, t1

  • t3, t1, t2

  • t3, t2, t1

子查询展开

子查询展开是指将 WHERE 条件中子查询提升到父查询中,并作为联接条件与父查询并列进行展开。转换后子查询将不存在,外层父查询中会变成多表联接。

这样改写的好处是优化器在进行路径选择、联接方法和联接排序时都会考虑到子查询中的表,从而可以获得更优的执行计划。涉及的子查询表达式一般有 NOT IN、IN、NOT EXIST、EXIST、ANY、ALL。

子查询展开的方式如下:

  • 改写条件使生成的联接语句能够返回与原始语句相同的行。

  • 展开为半联接(SEMI JOIN / ANTI JOIN)

    如下例所示,t2.c2 不具有唯一性,改为 SEMI JOIN,该语句改写后执行计划为:

    1. Query OK, 0 rows affected (0.17 sec)
    2. obclient>CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
    3. Query OK, 0 rows affected (0.01 sec)
    4. obclient>EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c2 FROM t2)\G;
    5. *************************** 1. row ***************************
    6. Query Plan:
    7. =======================================
    8. |ID|OPERATOR |NAME|EST. ROWS|COST|
    9. ---------------------------------------
    10. |0 |HASH SEMI JOIN| |495 |3931|
    11. |1 | TABLE SCAN |t1 |1000 |499 |
    12. |2 | TABLE SCAN |t2 |1000 |433 |
    13. =======================================
    14. Outputs & filters:
    15. -------------------------------------
    16. 0 - output([t1.c1], [t1.c2]), filter(nil),
    17. equal_conds([t1.c1 = t2.c2]), other_conds(nil)
    18. 1 - output([t1.c1], [t1.c2]), filter(nil),
    19. access([t1.c1], [t1.c2]), partitions(p0)
    20. 2 - output([t2.c2]), filter(nil),
    21. access([t2.c2]), partitions(p0)

    将查询前面操作符改为 NOT IN 后,可改写为 ANTI JOIN,具体计划如下例所示:

    1. obclient>EXPLAIN SELECT * FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c2 FROM t2)\G;
    2. *************************** 1. row ***************************
    3. Query Plan:
    4. ================================================
    5. |ID|OPERATOR |NAME|EST. ROWS|COST |
    6. ------------------------------------------------
    7. |0 |NESTED-LOOP ANTI JOIN| |0 |520245|
    8. |1 | TABLE SCAN |t1 |1000 |499 |
    9. |2 | TABLE SCAN |t2 |22 |517 |
    10. ================================================
    11. Outputs & filters:
    12. -------------------------------------
    13. 0 - output([t1.c1], [t1.c2]), filter(nil),
    14. conds(nil), nl_params_([t1.c1], [(T_OP_IS, t1.c1, NULL, 0)])
    15. 1 - output([t1.c1], [t1.c2], [(T_OP_IS, t1.c1, NULL, 0)]), filter(nil),
    16. access([t1.c1], [t1.c2]), partitions(p0)
    17. 2 - output([t2.c2]), filter([(T_OP_OR, ? = t2.c2, ?, (T_OP_IS, t2.c2, NULL, 0))]),
    18. access([t2.c2]), partitions(p0)
  • 子查询展开为内联接

    上面示例的 Q1 中如果将 t2.c2 改为 t2.c1,由于 t2.c1 为主键,子查询输出具有唯一性,此时可以直接转换为内联接,如下例所示:

    1. Q1:
    2. obclient>SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c1 FROM t2)\G;
    3. <==>
    4. Q2:
    5. obclient>SELECT t1.* FROM t1, t2 WHERE t1.c1 = t2.c1;

    Q1 改写后的计划如下例所示:

    1. obclient>EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c1 FROM t2)\G;
    2. *************************** 1. row ***************************
    3. Query Plan:
    4. ====================================
    5. |ID|OPERATOR |NAME|EST. ROWS|COST|
    6. ------------------------------------
    7. |0 |HASH JOIN | |1980 |3725|
    8. |1 | TABLE SCAN|t2 |1000 |411 |
    9. |2 | TABLE SCAN|t1 |1000 |499 |
    10. ====================================
    11. Outputs & filters:
    12. -------------------------------------
    13. 0 - output([t1.c1], [t1.c2]), filter(nil),
    14. equal_conds([t1.c1 = t2.c1]), other_conds(nil)
    15. 1 - output([t2.c1]), filter(nil),
    16. access([t2.c1]), partitions(p0)
    17. 2 - output([t1.c1], [t1.c2]), filter(nil),
    18. access([t1.c1], [t1.c2]), partitions(p0)

    对于 NOT IN、IN、NOT EXIST、EXIST、ANY、ALL 子查询表达式都可以对应做类似的改写操作。

ANY/ALL 使用 MAX/MIN 改写

对于 ANY/ALL 的子查询,如果子查询中没有 GROUP BY 子句、聚集函数以及 HAVING 时,以下表达式可以使用聚集函数 MIN/MAX 进行等价转换,其中 col_item 为单独列且有非 NULL 属性:

  1. val > ALL(SELECT col_item ...) <==> val > ALL(SELECT MAX(col_item) ...);
  2. val >= ALL(SELECT col_item ...) <==> val >= ALL(SELECT MAX(col_item) ...);
  3. val < ALL(SELECT col_item ...) <==> val < ALL(SELECT MIN(col_item) ...);
  4. val <= ALL(SELECT col_item ...) <==> val <= ALL(SELECT MIN(col_item) ...);
  5. val > ANY(SELECT col_item ...) <==> val > ANY(SELECT MIN(col_item) ...);
  6. val >= ANY(SELECT col_item ...) <==> val >= ANY(SELECT MIN(col_item) ...);
  7. val < ANY(SELECT col_item ...) <==> val < ANY(SELECT MAX(col_item) ...);
  8. val <= ANY(SELECT col_item ...) <==> val <= ANY(SELECT MAX(col_item) ...);

将子查询更改为含有 MAX/MIN 的子查询后,再结合使用 MAX/MIN 的改写,可减少改写前对内表的多次扫描,如下例所示:

  1. obclient>SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);
  2. <==>
  3. obclient>SELECT c1 FROM t1 WHERE c1 > ANY(SELECT MIN(c1) FROM t2);

结合 MAX/MIN 的改写后,可利用 t2.c1 的主键序将 LIMIT 1 直接下压到 TABLE SCAN,将 MIN 值输出,执行计划如下:

  1. obclient>EXPLAIN SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2)\G;
  2. *************************** 1. row ***************************
  3. Query Plan:
  4. ===================================================
  5. |ID|OPERATOR |NAME |EST. ROWS|COST|
  6. ---------------------------------------------------
  7. |1 | TABLE SCAN |t1 |1 |37 |
  8. |2 | SCALAR GROUP BY| |1 |37 |
  9. |3 | SUBPLAN SCAN |subquery_table|1 |37 |
  10. |4 | TABLE SCAN |t2 |1 |36 |
  11. ===================================================
  12. Outputs & filters:
  13. -------------------------------------
  14. 0 - output([t1.c1]), filter([t1.c1 > ANY(subquery(1))]),
  15. exec_params_(nil), onetime_exprs_(nil), init_plan_idxs_([1])
  16. 1 - output([t1.c1]), filter(nil),
  17. access([t1.c1]), partitions(p0)
  18. 2 - output([T_FUN_MIN(subquery_table.c1)]), filter(nil),
  19. group(nil), agg_func([T_FUN_MIN(subquery_table.c1)])
  20. 3 - output([subquery_table.c1]), filter(nil),
  21. access([subquery_table.c1])
  22. 4 - output([t2.c1]), filter(nil),
  23. access([t2.c1]), partitions(p0),
  24. limit(1), offset(nil)

外联接操作可分为左外联接、右外联接和全外联接。在联接过程中,由于外联接左右顺序不能变换,优化器对联接顺序的选择会受到限制。外联接消除是指将外联接转换成内联接,从而可以提供更多可选择的联接路径,供优化器考虑。

如果进行外联接消除,需要存在“空值拒绝条件”,即在 WHERE 条件中存在,当内表生成的值为 NULL 时,输出为 FALSE 的条件。

如下例所示:

这是一个外联接,在其输出行中 t2.c2 可能为 NULL。如果加上一个条件 ,则通过该条件过滤后,t2.c1 输出不可能为 NULL, 从而可以将外联接转换为内联接。

  1. obclient>SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2 WHERE t2.c2 > 5;
  2. <==>
  3. obclient>SELECT t1.c1, t2.c2 FROM t1 LEFT INNER JOIN t2 ON t1.c2 = t2.c2
  4. WHERE t2.c2 > 5;

HAVING 条件消除

如果查询中没有聚集操作及 GROUP BY,则 HAVING 可以合并到 WHERE 条件中,并将 HAVING 条件删除, 从而可以将 HAVING 条件在 WHERE 条件中统一管理,并进行进一步相关优化。

  1. obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;
  2. <==>
  3. obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 AND t1.c2 > 1;
  1. obclient>EXPLAIN SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1\G;
  2. *************************** 1. row ***************************
  3. Query Plan:
  4. =========================================
  5. |ID|OPERATOR |NAME|EST. ROWS|COST|
  6. -----------------------------------------
  7. |0 |NESTED-LOOP JOIN| |1 |59 |
  8. |1 | TABLE SCAN |t1 |1 |37 |
  9. |2 | TABLE GET |t2 |1 |36 |
  10. =========================================
  11. Outputs & filters:
  12. -------------------------------------
  13. 0 - output([t1.c1], [t1.c2], [t2.c1], [t2.c2]), filter(nil),
  14. conds(nil), nl_params_([t1.c1])
  15. 1 - output([t1.c1], [t1.c2]), filter([t1.c2 > 1]),
  16. access([t1.c1], [t1.c2]), partitions(p0)
  17. 2 - output([t2.c1], [t2.c2]), filter(nil),
  18. access([t2.c1], [t2.c2]), partitions(p0)

等价关系推导

等价关系推导是指利用比较操作符的传递性,推倒出新的条件表达式,从而减少需要处理的行数或者选择到更有效的索引。

OceanBase 数据库可对等值联接进行推导,比如 a = b AND a > 1 可以推导出 a = b AND a > 1 AND b > 1, 如果 b 上有索引,且 b > 1 在该索引选择率很低,则可以大大提升访问 b 列所在表的性能。

如下例所示,条件 t1.c1 = t2.c2 AND t1.c1 > 2,等价推导后为 t1.c1 = t2.c2 AND t1.c1 > 2 AND t2.c2 > 2,从计划中可以看到 t2.c2 已下压到 TABLE SCAN,并且使用 t2.c2 对应的索引。

  1. obclient>CREATE TABLE t1(c1 INT PRIMARY KEY, c2 INT);
  2. Query OK, 0 rows affected (0.15 sec)
  3. obclient>CREATE TABLE t2(c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2(c2));
  4. Query OK, 0 rows affected (0.10 sec)
  5. /*此命令需运行于 MySQL 模式下*/
  6. obclient>EXPLAIN EXTENDED_NOADDR SELECT t1.c1, t2.c2 FROM t1, t2
  7. WHERE t1.c1 = t2.c2 AND t1.c1 > 2\G;
  8. *************************** 1. row ***************************
  9. Query Plan:
  10. ==========================================
  11. |ID|OPERATOR |NAME |EST. ROWS|COST|
  12. ------------------------------------------
  13. |0 |MERGE JOIN | |5 |78 |
  14. |1 | TABLE SCAN|t2(IDX_c2)|5 |37 |
  15. |2 | TABLE SCAN|t1 |3 |37 |
  16. ==========================================
  17. Outputs & filters:
  18. -------------------------------------
  19. 0 - output([t1.c1], [t2.c2]), filter(nil),
  20. equal_conds([t1.c1 = t2.c2]), other_conds(nil)
  21. 1 - output([t2.c2]), filter(nil),
  22. access([t2.c2]), partitions(p0),
  23. is_index_back=false,
  24. range_key([t2.c2], [t2.c1]), range(2,MAX ; MAX,MAX),
  25. range_cond([t2.c2 > 2])
  26. 2 - output([t1.c1]), filter(nil),
  27. access([t1.c1]), partitions(p0),
  28. is_index_back=false,
  29. range_key([t1.c1]), range(2 ; MAX),
  30. range_cond([t1.c1 > 2])

恒真/假消除

对于如下恒真恒假条件可以进行消除:

  • false and expr = 恒 false

  • true or expr = 恒 true

如下例所示,对于 WHERE 0 > 1 AND c1 = 3,由于 0 > 1 使得 AND 恒假, 所以该 SQL 不用执行,可直接返回,从而加快查询的执行。

  1. obclient>EXPLAIN EXTENDED_NOADDR SELECT * FROM t1 WHERE 0 > 1 AND c1 = 3\G;
  2. *************************** 1. row ***************************
  3. Query Plan:
  4. ===================================
  5. |ID|OPERATOR |NAME|EST. ROWS|COST|
  6. -----------------------------------
  7. |0 |TABLE SCAN|t1 |0 |38 |
  8. ===================================
  9. Outputs & filters:
  10. -------------------------------------
  11. 0 - output([t1.c1], [t1.c2]), filter([0], [t1.c1 = 3]), startup_filter([0]),
  12. access([t1.c1], [t1.c2]), partitions(p0),
  13. is_index_back=false, filter_before_indexback[false,false],
  14. range_key([t1.__pk_increment], [t1.__pk_cluster_id], [t1.__pk_partition_id]),
  15. range(MAX,MAX,MAX ; MIN,MIN,MIN)always false

冗余排序消除

冗余排序消除是指删除 order item 中不需要的项,减少排序开销。以下三种情况可进行排序消除:

  • ORDER BY 表达式列表中有重复列,可进行去重后排序。

    1. obclient>SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c1, c2, c3 ;
    2. <==>
    3. obclient>SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;
  • ORDER BY 列中存在 where 中有单值条件的列,该列排序可删除。

    1. obclient>SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;
    2. <==>
    3. obclient>SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c3;
  • 如果本层查询有 ORDER BY 但是没有 LIMIT,且本层查询位于父查询的集合操作中,则 ORDER BY 可消除。因为对两个有序的集合做 UNION 操作,其结果是乱序的。但是如果 ORDER BY 中有 LIMIT,则语义是取最大/最小的 N 个,此时不能消除 ORDER BY,否则有语义错误。

LIMIT 下压

LIMIT 下压改写是指将 LIMIT 下降到子查询中,OceanBase 数据库现在支持在不改变语义的情况下,将 LIMIT 下压到视图(示例 1)及 UNION 对应子查询(示例 2)中。

示例 1:

  1. obclient>SELECT * FROM (SELECT * FROM t1 ORDER BY c1) a LIMIT 1;
  2. <==>
  3. obclient>SELECT * FROM (SELECT * FROM t1 ORDER BY c1 LIMIT 1) a LIMIT 1;

示例 2:

  1. obclient>(SELECT c1,c2 FROM t1) UNION ALL (SELECT c3,c4 FROM t2) LIMIT 5;
  2. <==>
  3. obclient>(SELECT c1,c2 FROM t1 LIMIT 5) UNION ALL (SELECT c3,c4 FROM t2 limit 5) LIMIT 5;

DISTINCT 消除

  • 如果 select item 中只包含常量,则可以消除 DISTINCT,并加上 LIMIT 1。

    1. obclient>SELECT DISTINCT 1,2 FROM t1 ;
    2. <==>
    3. obclient>CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT);
    4. Query OK, 0 rows affected (0.17 sec)
    5. obclient>EXPLAIN EXTENDED_NOADDR SELECT DISTINCT 1,2 FROM t1\G;
    6. *************************** 1. row ***************************
    7. Query Plan:
    8. |ID|OPERATOR |NAME|EST. ROWS|COST|
    9. -----------------------------------
    10. |0 |TABLE SCAN|t1 |1 |36 |
    11. ===================================
    12. Outputs & filters:
    13. -------------------------------------
    14. 0 - output([1], [2]), filter(nil),
    15. access([t1.c1]), partitions(p0),
    16. limit(1), offset(nil),
    17. is_index_back=false,
    18. range_key([t1.c1]), range(MIN ; MAX)always true
  • 如果 select item 中包含确保唯一性约束的列,则 DISTINCT 能够消除,如下示例中 (c1, c2)为主键,可确保 c1、c2 和 c3 唯一性, 从而 DISTINCT 可消除。

    1. obclient>CREATE TABLE t2(c1 INT, c2 INT, c3 INT, PRIMARY KEY(c1, c2));
    2. Query OK, 0 rows affected (0.17 sec)
    3. obclient>SELECT DISTINCT c1, c2, c3 FROM t2;
    4. <==>
    5. obclient>SELECT c1, c2 c3 FROM t2;
    6. obclient>EXPLAIN SELECT DISTINCT c1, c2, c3 FROM t2\G;
    7. *************************** 1. row ***************************
    8. Query Plan:
    9. ===================================
    10. |ID|OPERATOR |NAME|EST. ROWS|COST|
    11. -----------------------------------
    12. |0 |TABLE SCAN|t2 |1000 |455 |
    13. ===================================
    14. Outputs & filters:
    15. -------------------------------------
    16. 0 - output([t2.c1], [t2.c2], [t2.c3]), filter(nil),
    17. access([t2.c1], [t2.c2], [t2.c3]), partitions(p0)

MIN/MAX 改写

  • 当 MIN/MAX 函数中参数为索引前缀列,且不含 GROUP BY 时,可将该 scalar aggregate 转换为走索引扫描 1 行的情况,如下例所示:

    1. obclient>CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2_c3(c2,c3));
    2. Query OK, 0 rows affected (0.17 sec)
    3. obclient>SELECT MIN(c2) FROM t1;
    4. <==>
    5. obclient>SELECT MIN(c2) FROM (SELECT c2 FROM t2 ORDER BY c2 LIMIT 1) AS t;
    6. obclient>EXPLAIN SELECT MIN(c2) FROM t1\G;
    7. *************************** 1. row ***************************
    8. Query Plan:
    9. ==================================================
    10. |ID|OPERATOR |NAME |EST. ROWS|COST|
    11. --------------------------------------------------
    12. |0 |SCALAR GROUP BY| |1 |37 |
    13. |1 | SUBPLAN SCAN |subquery_table|1 |37 |
    14. |2 | TABLE SCAN |t1(idx_c2_c3) |1 |36 |
    15. ==================================================
    16. Outputs & filters:
    17. -------------------------------------
    18. 0 - output([T_FUN_MIN(subquery_table.c2)]), filter(nil),
    19. group(nil), agg_func([T_FUN_MIN(subquery_table.c2)])
    20. 1 - output([subquery_table.c2]), filter(nil),
    21. access([subquery_table.c2])
    22. 2 - output([t1.c2]), filter([(T_OP_IS_NOT, t1.c2, NULL, 0)]),
    23. access([t1.c2]), partitions(p0),
    24. limit(1), offset(nil)
  • 如果 SELECT MIN/MAX 的参数为常量,而且包含 GROUP BY,可以将 MIN/MAX 改为常量,从而减少 MIN/MAX 的计算开销。

    1. obclient>SELECT MAX(1) FROM t1 GROUP BY c1;
    2. <==>
    3. obclient>SELECT 1 FROM t1 GROUP BY c1;
    4. obclient>EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1 GROUP BY c1\G;
    5. *************************** 1. row ***************************
    6. Query Plan:
    7. ===================================
    8. |ID|OPERATOR |NAME|EST. ROWS|COST|
    9. -----------------------------------
    10. |0 |TABLE SCAN|t1 |1000 |411 |
    11. ===================================
    12. Outputs & filters:
    13. -------------------------------------
    14. 0 - output([1]), filter(nil),
    15. access([t1.c1]), partitions(p0),
    16. is_index_back=false,
    17. range_key([t1.c1]), range(MIN ; MAX)always true
  • 如果 SELECT MIN/MAX 的参数为常量,而且不含 GROUP BY,可以按照如下示例进行改写,从而走索引只需扫描 1 行。

    1. obclient>SELECT MAX(1) FROM t1;
    2. <==>
    3. obclient>SELECT MAX(t.a) FROM (SELECT 1 AS a FROM t1 LIMIT 1) t;
    4. obclient>EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1\G;
    5. *************************** 1. row ***************************
    6. Query Plan:
    7. ==================================================
    8. |ID|OPERATOR |NAME |EST. ROWS|COST|
    9. --------------------------------------------------
    10. |0 |SCALAR GROUP BY| |1 |37 |
    11. |1 | SUBPLAN SCAN |subquery_table|1 |37 |
    12. |2 | TABLE SCAN |t1 |1 |36 |
    13. ==================================================
    14. Outputs & filters:
    15. -------------------------------------
    16. 0 - output([T_FUN_MAX(subquery_table.subquery_col_alias)]), filter(nil),
    17. group(nil), agg_func([T_FUN_MAX(subquery_table.subquery_col_alias)])
    18. 1 - output([subquery_table.subquery_col_alias]), filter(nil),
    19. access([subquery_table.subquery_col_alias])
    20. 2 - output([1]), filter(nil),
    21. access([t1.c1]), partitions(p0),
    22. limit(1), offset(nil),