T-Digest Functions
T-digest was developed by Ted Dunning.
A T-digest is a data sketch which stores approximate percentile information. The Presto type for this data structure is called tdigest, and it accepts a parameter of type which represents the set of numbers to be ingested by the tdigest
. Other numeric types may be added in a future release.
T-digests may be merged without losing precision, and for storage and retrieval they may be cast to/from VARBINARY
.
Functions
merge(tdigest<double>) → tdigest<double>
Merges all input tdigest
s into a single tdigest
.
value_at_quantile(tdigest<double>, quantile) → double
Returns the approximate percentile values from the T-digest given the number quantile
between 0 and 1.
Returns the approximate quantile number between 0 and 1 from the T-digest given an input value
. Null is returned if the T-digest is empty or the input value is outside of the range of the digest.
scale_tdigest(tdigest<double>, scale_factor) → tdigest<double>
Returns a whose distribution has been scaled by a factor specified by scale_factor
.
values_at_quantiles(tdigest<double>, quantiles) → array<double>
Returns the approximate percentile values as an array given the input T-digest and array of values between 0 and 1 which represent the quantiles to return.
trimmed_mean(tdigest<double>, lower_quantile, upper_quantile) → double
Returns an estimate of the mean, excluding portions of the distribution outside the provided quantile bounds. Both lower_quantile
and upper_quantile
must be between 0 and 1.
Returns the tdigest
which is composed of all input values of x
.
tdigest_agg(x, w) → tdigest<double>
Returns the tdigest
which is composed of all input values of using the per-item weight w
.
tdigest_agg(x, w, compression) → tdigest<double>
Returns the tdigest
which is composed of all input values of x
using the per-item weight w
and compression factor compression
. compression
must be a value greater than zero, and it must be constant for all input rows.
Compression factor of 500 is a good starting point that typically yields good accuracy and performance.
destructure_tdigest(tdigest<double>) → row<centroid_means array<double>, centroid_weights array<integer>, compression double, min double, max double, sum double, count bigint>