15.3. 平行查詢計畫

    15.3.2. Parallel Joins

    15.3.4. Parallel Plan Tips

    The following types of parallel-aware table scans are currently supported.

    • In aparallel sequential scan, the table’s blocks will be divided among the cooperating processes. Blocks are handed out one at a time, so that access to the table remains sequential.
    • In aparallel index scan_or_parallel index-only scan, the cooperating processes take turns reading data from the index. Currently, parallel index scans are supported only for btree indexes. Each process will claim a single index block and will scan and return all tuples referenced by that block; other process can at the same time be returning tuples from a different index block. The results of a parallel btree scan are returned in sorted order within each worker process.

    Only the scan types listed above may be used for a scan on the driving table within a parallel plan. Other scan types, such as parallel scans of non-btree indexes, may be supported in the future.

    Just as in a non-parallel plan, the driving table may be joined to one or more other tables using a nested loop, hash join, or merge join. The inner side of the join may be any kind of non-parallel plan that is otherwise supported by the planner provided that it is safe to run within a parallel worker. For example, if a nested loop join is chosen, the inner plan may be an index scan which looks up a value taken from the outer side of the join.

    PostgreSQLsupports parallel aggregation by aggregating in two stages. First, each process participating in the parallel portion of the query performs an aggregation step, producing a partial result for each group of which that process is aware. This is reflected in the plan as anode. Second, the partial results are transferred to the leader via theGathernode. Finally, the leader re-aggregates the results across all workers in order to produce the final result. This is reflected in the plan as aFinalize Aggregatenode.

    Because thenode runs on the leader process, queries which produce a relatively large number of groups in comparison to the number of input rows will appear less favorable to the query planner. For example, in the worst-case scenario the number of groups seen by theFinalize Aggregatenode could be as many as the number of input rows which were seen by all worker processes in thePartial Aggregatestage. For such cases, there is clearly going to be no performance benefit to using parallel aggregation. The query planner takes this into account during the planning process and is unlikely to choose parallel aggregate in this scenario.

    Parallel aggregation is not supported in all situations. Each aggregate must befor parallelism and must have a combine function. If the aggregate has a transition state of type, it must have serialization and deserialization functions. SeeCREATE AGGREGATEfor more details. Parallel aggregation is not supported if any aggregate function call containsDISTINCTorORDER BYclause and is also not supported for ordered set aggregates or when the query involves. It can only be used when all joins involved in the query are also part of the parallel portion of the plan.

    When executing a parallel plan, you can useEXPLAIN (ANALYZE, VERBOSE)to display per-worker statistics for each plan node. This may be useful in determining whether the work is being evenly distributed between all plan nodes and more generally in understanding the performance characteristics of the plan.