Release 0.57
Note
approx_distinct()
should be used in preference to this whenever an approximate answer is allowable as it is substantially faster and does not have any limits on the number of distinct items it can process. COUNT(DISTINCT ...)
must transfer every item over the network and keep each distinct item in memory.
All Hive connectors support reading data from . This requires two additional catalog properties for the Hive connector to specify your AWS Access Key ID and Secret Access Key:
hive.s3.aws-access-key=AKIAIOSFODNN7EXAMPLE
hive.s3.aws-secret-key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Allow specifying catalog and schema in the JDBC Driver URL.
Allow certain custom s to work by propagating Hive serialization properties to the
RecordReader
.Many execution engine performance improvements.