TDigest Functions#
Presto implements two algorithms for estimating rankbased metrics, quantile digest and Tdigest. Tdigest has better performance in general while the Presto implementation of quantile digests supports more numeric types. Tdigest has better accuracy at the tails, often dramatically better, but may have worse accuracy at the median, depending on the compression factor used. In comparison, quantile digests supports a maximum rank error, which guarantees relative uniformity of precision along the quantiles. Quantile digests are also formally proven to support lossless merges, while Tdigest is not (but does empirically demonstrate lossless merges).
Tdigest was developed by Ted Dunning.
Data Structures#
A Tdigest is a data sketch which stores approximate percentile information.
The Presto type for this data structure is called tdigest,
and it accepts a parameter of type double
which represents the set of
numbers to be ingested by the tdigest
. Other numeric types may be added
in a future release.
Tdigests may be merged without losing precision, and for storage and retrieval
they may be cast to/from VARBINARY
.
Functions#

merge
(tdigest<double>) → tdigest<double> Merges all input
tdigest
s into a singletdigest
.

value_at_quantile
(tdigest<double>, quantile) → double# Returns the approximate percentile values from the Tdigest given the number
quantile
between 0 and 1.

quantile_at_value
(tdigest<double>, value) → double# Returns the approximate quantile number between 0 and 1 from the Tdigest given an input
value
. Null is returned if the Tdigest is empty or the input value is outside of the range of the digest.

scale_tdigest
(tdigest<double>, scale_factor) → tdigest<double># Returns a
tdigest
whose distribution has been scaled by a factor specified byscale_factor
.

values_at_quantiles
(tdigest<double>, quantiles) → array<double># Returns the approximate percentile values as an array given the input Tdigest and array of values between 0 and 1 which represent the quantiles to return.

tdigest_agg
(x) → tdigest<double># Returns the
tdigest
which is composed of all input values ofx
.

tdigest_agg
(x, w) → tdigest<double># Returns the
tdigest
which is composed of all input values ofx
using the peritem weightw
.

tdigest_agg
(x, w, accuracy) → tdigest<double># Returns the
tdigest
which is composed of all input values ofx
using the peritem weightw
and maximum error ofaccuracy
.accuracy
must be a value greater than zero and less than one, and it must be constant for all input rows.

destructure_tdigest
(tdigest<double>) → row<centroid_means array<double>, centroid_weights array<integer>, compression double, min double, max double, sum double, count bigint># Returns a row that represents a
tdigest
data structure in the form of its component parts. These include arrays of the centroid means and weights, the compression factor, and the the maximum, minimum, sum and count of the values in the digest.