13.218. Release 0.55

    Hash Distributed Aggregations

    GROUP BY aggregations are now distributed across a fixed number of machines. This is controlled by the property query.initial-hash-partitions set in etc/config.properties of the coordinator and workers. If the value is larger than the number of machines available during query scheduling, Presto will use all available machines. The default value is 8.

    The maximum memory size of an aggregation is now query.initial-hash-partitions times task.max-memory.

    Simple Distinct Aggregations

    We have added support for the DISTINCT argument qualifier for aggregation functions. This is currently limited to queries without a clause and where all the aggregation functions have the same input expression. For example:

    Support for complete DISTINCT functionality is in our roadmap.

    In addition to receiving range predicates, the connector can also communicate back the ranges of each partition for use in the query optimizer. This can be a major performance gain for JOIN queries where one side of the join has only a few partitions. For example:

    1. SELECT * FROM data_1_year JOIN data_1_week USING (ds)

    If data_1_year and data_1_week are both partitioned on ds, the connector will report back that one table has partitions for 365 days and the other table has partitions for only 7 days. Then the optimizer will limit the scan of the data_1_year table to only the 7 days that could possible match. These constraints are combined with other predicates in the query to further limit the data scanned.

    Note

    This is a backwards incompatible change with the previous connector SPI, so if you have written a connector, you will need to update your code before deploying this release.

    json_array_get Function

    Non-reserved Keywords

    The keywords DATE, TIME, TIMESTAMP, and INTERVAL are no longer reserved keywords in the grammar. This means that you can access a column named date without quoting the identifier.

    The Presto CLI now has an option to set the query source. The source value is shown in the UI and is recorded in events. When using the CLI in shell scripts it is useful to set the --source option to distinguish shell scripts from normal users.

    SHOW SCHEMAS FROM

    Although the documentation included the syntax SHOW SCHEMAS [FROM catalog], it was not implemented. This release now implements this statement correctly.

    Hive Bucketed Table Fixes

    For queries over Hive bucketed tables, Presto will attempt to limit scans to the buckets that could possible contain rows that match the WHERE clause. Unfortunately, the algorithm we were using to select the buckets was not correct, and sometimes we would either select the wrong files or fail to select any files. We have aligned the algorithm with Hive and now the optimization works as expected.