11.181. Release 0.60
The JDBC driver is now always packaged as a standalone jar without anydependencies. Previously, this artifact was published with the Mavenclassifier standalone
. The new build does not publish this artifactanymore.
USE CATALOG and USE SCHEMA
The now supports USE CATALOG
andUSE SCHEMA
.
We have added a new connector that will generate synthetic data following theTPC-H specification. This connector makes it easy to generate large datasets fortesting and bug reports. When generating bug reports, we encourage users to usethis catalog since it eases the process of reproducing the issue. The data isgenerated dynamically for each query, so no disk space is used by thisconnector. To add the tpch
catalog to your system, create the catalogproperty file on both the coordinator and workerswith the following contents:
Additionally, update the datasources
property in the config properties file,etc/config.properties
, for the workers to include tpch
.
SPI changes
The Connector
interface now has explicit methods for supplying the servicesexpected by the query engine. Previously, this was handled by a genericgetService
method.
Note
Additionally, we have added the NodeManager
interface to the SPI to allow aplugin to detect all nodes in the Presto cluster. This is important for someconnectors that can divide a table evenly between all nodes as long as theconnector knows how many nodes exist. To access the node manager, simply addthe following to the class:
For queries with the following form:
We have added an optimization that stops the query as soon as N
distinctrows are found.
When optimizing a join, Presto analyzes the ranges of the partitions on eachside of a join and pushes these ranges to the other side. When tables have alot of partitions, this can result in a very large filter with one expressionfor each partition. The optimizer now summarizes the predicate ranges to reducethe complexity of the filters.
Window functions with a PARTITION BY
clause are now distributed based on thepartition key.
Bug fixes
- Scheduling
In the changes to schedule splits in batches, we introduced two bugs thatresulted in an unbalanced workload across nodes which increases query latency.The first problem was not inspecting the queued split count of the nodes whilescheduling the batch, and the second problem was not counting the splitsawaiting creation in the task executor.
Presto converts complex Hive types (array, map, struct and union) into JSON.Previously, numeric keys in maps were converted to numbers, not strings,which is invalid as JSON only allows strings for object keys. This preventedthe JSON Functions and Operators from working.
- Hive hidden files
Presto will now ignore files in Hive that start with an underscore _
ora dot .
. This matches the behavior of Hadoop MapReduce / Hive.
- Failures incorrectly reported as no data
Certain types of failures would result in the query appearing to succeed andreturn an incomplete result (often zero rows). There was a race conditionbetween the error propagation and query teardown. In some cases, the querywould be torn down before the exception made it to the coordinator. This was aregression introduced during the query teardown optimization work. There arenow tests to catch this type of bug.
- Exchange client leak
When a query finished early (e.g., limit or failure) and the exchange operatorwas blocked waiting for data from other nodes, the exchange was not be closedproperly. This resulted in continuous failing HTTP requests which leakedresources and produced large log files.
- Hash partitioning
- Compiled NULL literal
In some cases queries with a select expression like CAST(NULL AS varchar)
would fail due to a bug in the output type detection code in expressioncompiler.