daft.Expression.approx_percentiles#
- Expression.approx_percentiles(percentiles: float | list[float]) Expression [source]#
Calculates the approximate percentile(s) for a column of numeric values
For numeric columns, we use the sketches_ddsketch crate. This is a Rust implementation of the paper DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees (Masson et al.)
Null values are ignored in the computation of the percentiles
If all values are Null then the result will also be Null
If
percentiles
are supplied as a single float, then the resultant column is aFloat64
columnIf
percentiles
is supplied as a list, then the resultant column is aFixedSizeList[Float64; N]
column, whereN
is the length of the supplied list.
Example
A global calculation of approximate percentiles:
>>> import daft >>> df = daft.from_pydict({"scores": [1, 2, 3, 4, 5, None]}) >>> df = df.agg( ... df["scores"].approx_percentiles(0.5).alias("approx_median_score"), ... df["scores"].approx_percentiles([0.25, 0.5, 0.75]).alias("approx_percentiles_scores"), ... ) >>> df.show() ╭─────────────────────┬────────────────────────────────╮ │ approx_median_score ┆ approx_percentiles_scores │ │ --- ┆ --- │ │ Float64 ┆ FixedSizeList[Float64; 3] │ ╞═════════════════════╪════════════════════════════════╡ │ 2.9742334234767163 ┆ [1.993661701417351, 2.9742334… │ ╰─────────────────────┴────────────────────────────────╯ (Showing first 1 of 1 rows)
A grouped calculation of approximate percentiles:
>>> df = daft.from_pydict({"class": ["a", "a", "a", "b", "c"], "scores": [1, 2, 3, 1, None]}) >>> df = df.groupby("class").agg( ... df["scores"].approx_percentiles(0.5).alias("approx_median_score"), ... df["scores"].approx_percentiles([0.25, 0.5, 0.75]).alias("approx_percentiles_scores"), ... ) >>> df.show() ╭───────┬─────────────────────┬────────────────────────────────╮ │ class ┆ approx_median_score ┆ approx_percentiles_scores │ │ --- ┆ --- ┆ --- │ │ Utf8 ┆ Float64 ┆ FixedSizeList[Float64; 3] │ ╞═══════╪═════════════════════╪════════════════════════════════╡ │ c ┆ None ┆ None │ ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ a ┆ 1.993661701417351 ┆ [0.9900000000000001, 1.993661… │ ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ b ┆ 0.9900000000000001 ┆ [0.9900000000000001, 0.990000… │ ╰───────┴─────────────────────┴────────────────────────────────╯ (Showing first 3 of 3 rows)
- Parameters:
percentiles – the percentile(s) at which to find approximate values at. Can be provided as a single float or a list of floats.
- Returns:
FixedSizeList[Float64, len(percentiles)]
.- Return type:
A new expression representing the approximate percentile(s). If
percentiles
was a single float, this will be a newFloat64
expression. Ifpercentiles
was a list of floats, this will be a new expression with type