数据库中很多高级的改写规则(例如 complex view merge 和窗口函数改写)都需要基于代价进行改写,OceanBase 数据库后续版本会支持这些复杂的改写规则。

    OR-EXPANSION 是将一个查询改写成若干个用 UNION 组成的子查询,可以为每个子查询提供更优的优化空间,但是也会导致多个子查询的执行,所以这个改写需要基于代价去判断。

    OR-EXPANSION 的改写主要有如下三个作用:

    • 如下例所示,Q1 会被改写成 Q2 的形式,其中 Q2 中的谓词 保证了这两个子查询不会生成重复的结果。如果不进行改写,Q1 一般来说会选择主表作为访问路径,对于 Q2 来说,如果 t1 上存在索引(a)和索引(b),那么该改写可能会让 Q2 中的每一个子查询选择索引作为访问路径。

      完整示例如下:

      1. obclient>CREATE TABLE t1(a INT, b INT, c INT, d INT, e INT, INDEX IDX_a(a),
      2. INDEX IDX_b(b));
      3. Query OK, 0 rows affected (0.17 sec)
      4. /*如果不进行 OR-EXPANSION 的改写,该查询只能使用主表访问路径*/
      5. obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1 WHERE t1.a = 1 OR t1.b = 1;
      6. +--------------------------------------------------------------+
      7. | Query Plan |
      8. +--------------------------------------------------------------+
      9. | ===================================
      10. |ID|OPERATOR |NAME|EST. ROWS|COST|
      11. -----------------------------------
      12. |0 |TABLE SCAN|t1 |4 |649 |
      13. ===================================
      14. Outputs & filters:
      15. -------------------------------------
      16. 0 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter([t1.a = 1 OR t1.b = 1]),
      17. access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p0)
      18. /*改写之后,每个子查询能使用不同的索引访问路径*/
      19. obclient>EXPLAIN SELECT * FROM t1 WHERE t1.a = 1 OR t1.b = 1;
      20. +------------------------------------------------------------------------+
      21. | Query Plan |
      22. +------------------------------------------------------------------------+
      23. | =========================================
      24. |ID|OPERATOR |NAME |EST. ROWS|COST|
      25. -----------------------------------------
      26. |0 |UNION ALL | |3 |190 |
      27. |1 | TABLE SCAN|t1(idx_a)|2 |94 |
      28. |2 | TABLE SCAN|t1(idx_b)|1 |95 |
      29. =========================================
      30. Outputs & filters:
      31. -------------------------------------
      32. 0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)], [UNION(t1.c, t1.c)], [UNION(t1.d, t1.d)], [UNION(t1.e, t1.e)]), filter(nil)
      33. 1 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter(nil),
      34. access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p0)
      35. 2 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter([lnnvl(t1.a = 1)]),
      36. access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p02
    • 允许每个分支使用不同的连接算法来加速查询,避免使用笛卡尔联接。

      完整示例如下:

      1. obclient> CREATE TABLE t1(a INT, b INT);
      2. Query OK, 0 rows affected (0.17 sec)
      3. obclient> CREATE TABLE t2(a INT, b INT);
      4. Query OK, 0 rows affected (0.13 sec)
      5. /*如果不进行改写,只能使用 NESTED LOOP JOIN*/
      6. obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1, t2
      7. | Query Plan |
      8. +--------------------------------------------------------------------------+
      9. | ===========================================
      10. |ID|OPERATOR |NAME|EST. ROWS|COST |
      11. -------------------------------------------
      12. |0 |NESTED-LOOP JOIN| |3957 |585457|
      13. |1 | TABLE SCAN |t1 |1000 |499 |
      14. |2 | TABLE SCAN |t2 |4 |583 |
      15. ===========================================
      16. Outputs & filters:
      17. -------------------------------------
      18. 0 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
      19. conds(nil), nl_params_([t1.a], [t1.b])
      20. 1 - output([t1.a], [t1.b]), filter(nil),
      21. access([t1.a], [t1.b]), partitions(p0)
      22. 2 - output([t2.a], [t2.b]), filter([? = t2.a OR ? = t2.b]),
      23. access([t2.a], [t2.b]), partitions(p0)
      24. /*被改写之后,每个子查询都使用了 HASH JOIN*/
      25. obclient> EXPLAIN SELECT * FROM t1, t2 WHERE t1.a = t2.a OR t1.b = t2.b;
      26. +--------------------------------------------------------------------------+
      27. | Query Plan |
      28. +--------------------------------------------------------------------------+
      29. |ID|OPERATOR |NAME|EST. ROWS|COST|
      30. -------------------------------------
      31. |0 |UNION ALL | |2970 |9105|
      32. |1 | HASH JOIN | |1980 |3997|
      33. |2 | TABLE SCAN|t1 |1000 |499 |
      34. |3 | TABLE SCAN|t2 |1000 |499 |
      35. |4 | HASH JOIN | |990 |3659|
      36. |5 | TABLE SCAN|t1 |1000 |499 |
      37. |6 | TABLE SCAN|t2 |1000 |499 |
      38. =====================================
      39. Outputs & filters:
      40. -------------------------------------
      41. 0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)], [UNION(t2.a, t2.a)], [UNION(t2.b, t2.b)]), filter(nil)
      42. 1 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
      43. equal_conds([t1.a = t2.a]), other_conds(nil)
      44. 2 - output([t1.a], [t1.b]), filter(nil),
      45. access([t1.a], [t1.b]), partitions(p0)
      46. 3 - output([t2.a], [t2.b]), filter(nil),
      47. access([t2.a], [t2.b]), partitions(p0)
      48. 4 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
      49. equal_conds([t1.b = t2.b]), other_conds([lnnvl(t1.a = t2.a)])
      50. 5 - output([t1.a], [t1.b]), filter(nil),
      51. access([t1.a], [t1.b]), partitions(p0)
      52. 6 - output([t2.a], [t2.b]), filter(nil),
    • 允许每个分支分别消除排序,更加快速的获取 TOP-K 结果。

      如下例所示,Q1 会被改写成 Q2。对于 Q1 来说,执行方式是只能把满足条件的行数找出来,然后进行排序,最终取 TOP-10 结果。对于 Q2 来说,如果存在索引(a,b), 那么 Q2 中的两个子查询都可以使用索引把排序消除,每个子查询取 TOP-10 结果,然后最终对这 20 行数据排序一下取出最终的 TOP-10 行。

      1. obclient> CREATE TABLE t1(a INT, b INT, INDEX IDX_a(a, b));
      2. Query OK, 0 rows affected (0.20 sec)
      3. /*不改写的话,需要排序最终获取 TOP-K 结果*/
      4. obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1 WHERE t1.a = 1 OR t1.a = 2
      5. ORDER BY b LIMIT 10;
      6. +-------------------------------------------------------------------------+
      7. | Query Plan |
      8. +-------------------------------------------------------------------------+
      9. | ==========================================
      10. |ID|OPERATOR |NAME |EST. ROWS|COST|
      11. ------------------------------------------
      12. |0 |LIMIT | |4 |77 |
      13. |1 | TOP-N SORT | |4 |76 |
      14. |2 | TABLE SCAN|t1(idx_a)|4 |73 |
      15. ==========================================
      16. Outputs & filters:
      17. -------------------------------------
      18. 0 - output([t1.a], [t1.b]), filter(nil), limit(10), offset(nil)
      19. 1 - output([t1.a], [t1.b]), filter(nil), sort_keys([t1.b, ASC]), topn(10)
      20. 2 - output([t1.a], [t1.b]), filter(nil),
      21. access([t1.a], [t1.b]), partitions(p0)
      22. /* 进行改写的话,排序算子可以被消除,最终获取 TOP-K 结果*/
      23. obclient>EXPLAIN SELECT * FROM t1 WHERE t1.a = 1 OR t1.a = 2
      24. ORDER BY b LIMIT 10;
      25. +-------------------------------------------------------------------------+
      26. | Query Plan |
      27. +-------------------------------------------------------------------------+
      28. | ===========================================
      29. |ID|OPERATOR |NAME |EST. ROWS|COST|
      30. -------------------------------------------
      31. |0 |LIMIT | |3 |76 |
      32. |1 | TOP-N SORT | |3 |76 |
      33. |2 | UNION ALL | |3 |74 |
      34. |3 | TABLE SCAN|t1(idx_a)|2 |37 |
      35. |4 | TABLE SCAN|t1(idx_a)|1 |37 |
      36. ===========================================
      37. Outputs & filters:
      38. -------------------------------------
      39. 0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil), limit(10), offset(nil)
      40. 1 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil), sort_keys([UNION(t1.b, t1.b), ASC]), topn(10)
      41. 2 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil)
      42. 3 - output([t1.a], [t1.b]), filter(nil),
      43. access([t1.a], [t1.b]), partitions(p0),
      44. limit(10), offset(nil)
      45. 4 - output([t1.a], [t1.b]), filter([lnnvl(t1.a = 1)]),
      46. limit(10), offset(nil)