ARRAY JOIN Clause
Its name comes from the fact that it can be looked at as executing JOIN
with an array or nested data structure. The intent is similar to the arrayJoin function, but the clause functionality is broader.
Syntax:
You can specify only one ARRAY JOIN
clause in a SELECT
query.
Supported types of ARRAY JOIN
are listed below:
ARRAY JOIN
- In base case, empty arrays are not included in the result ofJOIN
.LEFT ARRAY JOIN
- The result ofJOIN
contains rows with empty arrays. The value for an empty array is set to the default value for the array element type (usually 0, empty string or NULL).
The examples below demonstrate the usage of the ARRAY JOIN
and LEFT ARRAY JOIN
clauses. Let’s create a table with an type column and insert values into it:
CREATE TABLE arrays_test
(
s String,
arr Array(UInt8)
) ENGINE = Memory;
INSERT INTO arrays_test
VALUES ('Hello', [1,2]), ('World', [3,4,5]), ('Goodbye', []);
┌─s───────────┬─arr─────┐
│ Hello │ [1,2] │
│ World │ [3,4,5] │
│ Goodbye │ [] │
└─────────────┴─────────┘
SELECT s, arr
FROM arrays_test
ARRAY JOIN arr;
┌─s─────┬─arr─┐
│ Hello │ 1 │
│ Hello │ 2 │
│ World │ 3 │
│ World │ 4 │
│ World │ 5 │
└───────┴─────┘
The next example uses the LEFT ARRAY JOIN
clause:
SELECT s, arr
FROM arrays_test
LEFT ARRAY JOIN arr;
┌─s───────────┬─arr─┐
│ Hello │ 1 │
│ Hello │ 2 │
│ World │ 3 │
│ World │ 4 │
│ World │ 5 │
│ Goodbye │ 0 │
└─────────────┴─────┘
An alias can be specified for an array in the ARRAY JOIN
clause. In this case, an array item can be accessed by this alias, but the array itself is accessed by the original name. Example:
SELECT s, arr, a
FROM arrays_test
ARRAY JOIN arr AS a;
┌─s─────┬─arr─────┬─a─┐
│ World │ [3,4,5] │ 3 │
│ World │ [3,4,5] │ 4 │
│ World │ [3,4,5] │ 5 │
└───────┴─────────┴───┘
Using aliases, you can perform ARRAY JOIN
with an external array. For example:
┌─s───────────┬─arr_external─┐
│ Hello │ 1 │
│ Hello │ 2 │
│ Hello │ 3 │
│ World │ 1 │
│ World │ 2 │
│ World │ 3 │
│ Goodbye │ 1 │
│ Goodbye │ 2 │
│ Goodbye │ 3 │
└─────────────┴──────────────┘
Multiple arrays can be comma-separated in the ARRAY JOIN
clause. In this case, JOIN
is performed with them simultaneously (the direct sum, not the cartesian product). Note that all the arrays must have the same size. Example:
SELECT s, arr, a, num, mapped
FROM arrays_test
ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num, arrayMap(x -> x + 1, arr) AS mapped;
┌─s─────┬─arr─────┬─a─┬─num─┬─mapped─┐
│ Hello │ [1,2] │ 1 │ 1 │ 2 │
│ Hello │ [1,2] │ 2 │ 2 │ 3 │
│ World │ [3,4,5] │ 3 │ 1 │ 4 │
│ World │ [3,4,5] │ 4 │ 2 │ 5 │
│ World │ [3,4,5] │ 5 │ 3 │ 6 │
└───────┴─────────┴───┴─────┴────────┘
The example below uses the arrayEnumerate function:
SELECT s, arr, a, num, arrayEnumerate(arr)
FROM arrays_test
ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num;
┌─s─────┬─arr─────┬─a─┬─num─┬─arrayEnumerate(arr)─┐
│ Hello │ [1,2] │ 1 │ 1 │ [1,2] │
│ Hello │ [1,2] │ 2 │ 2 │ [1,2] │
│ World │ [3,4,5] │ 3 │ 1 │ [1,2,3] │
│ World │ [3,4,5] │ 4 │ 2 │ [1,2,3] │
│ World │ [3,4,5] │ 5 │ 3 │ [1,2,3] │
└───────┴─────────┴───┴─────┴─────────────────────┘
CREATE TABLE nested_test
(
s String,
nest Nested(
x UInt8,
y UInt32)
) ENGINE = Memory;
INSERT INTO nested_test
VALUES ('Hello', [1,2], [10,20]), ('World', [3,4,5], [30,40,50]), ('Goodbye', [], []);
│ World │ [3,4,5] │ [30,40,50] │
│ Goodbye │ [] │ [] │
└─────────┴─────────┴────────────┘
SELECT s, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN nest;
When specifying names of nested data structures in ARRAY JOIN
, the meaning is the same as ARRAY JOIN
with all the array elements that it consists of. Examples are listed below:
SELECT s, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN `nest.x`, `nest.y`;
┌─s─────┬─nest.x─┬─nest.y─┐
│ Hello │ 1 │ 10 │
│ Hello │ 2 │ 20 │
│ World │ 3 │ 30 │
│ World │ 4 │ 40 │
│ World │ 5 │ 50 │
└───────┴────────┴────────┘
This variation also makes sense:
SELECT s, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN `nest.x`;
┌─s─────┬─nest.x─┬─nest.y─────┐
│ Hello │ 1 │ [10,20] │
│ Hello │ 2 │ [10,20] │
│ World │ 3 │ [30,40,50] │
│ World │ 4 │ [30,40,50] │
│ World │ 5 │ [30,40,50] │
└───────┴────────┴────────────┘
An alias may be used for a nested data structure, in order to select either the JOIN
result or the source array. Example:
SELECT s, `n.x`, `n.y`, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN nest AS n;
┌─s─────┬─n.x─┬─n.y─┬─nest.x──┬─nest.y─────┐
│ Hello │ 1 │ 10 │ [1,2] │ [10,20] │
│ Hello │ 2 │ 20 │ [1,2] │ [10,20] │
│ World │ 3 │ 30 │ [3,4,5] │ [30,40,50] │
│ World │ 4 │ 40 │ [3,4,5] │ [30,40,50] │
│ World │ 5 │ 50 │ [3,4,5] │ [30,40,50] │
└───────┴─────┴─────┴─────────┴────────────┘
Example of using the function:
SELECT s, `n.x`, `n.y`, `nest.x`, `nest.y`, num
FROM nested_test
ARRAY JOIN nest AS n, arrayEnumerate(`nest.x`) AS num;
┌─s─────┬─n.x─┬─n.y─┬─nest.x──┬─nest.y─────┬─num─┐
│ Hello │ 1 │ 10 │ [1,2] │ [10,20] │ 1 │
│ Hello │ 2 │ 20 │ [1,2] │ [10,20] │ 2 │
│ World │ 3 │ 30 │ [3,4,5] │ [30,40,50] │ 1 │
│ World │ 4 │ 40 │ [3,4,5] │ [30,40,50] │ 2 │
└───────┴─────┴─────┴─────────┴────────────┴─────┘
The query execution order is optimized when running ARRAY JOIN
. Although ARRAY JOIN
must always be specified before the WHERE/ clause in a query, technically they can be performed in any order, unless result of is used for filtering. The processing order is controlled by the query optimizer.