daft.Expression.approx_count_distinct#
- Expression.approx_count_distinct() Expression [source]#
Calculates the approximate number of non-
NULL
unique values in the expression.Approximation is performed using the HyperLogLog algorithm.
Example
A global calculation of approximate distinct values in a non-NULL column:
>>> import daft >>> df = daft.from_pydict({"values": [1, 2, 3, None]}) >>> df = df.agg( ... df["values"].approx_count_distinct().alias("distinct_values"), ... ) >>> df.show() ╭─────────────────╮ │ distinct_values │ │ --- │ │ UInt64 │ ╞═════════════════╡ │ 3 │ ╰─────────────────╯ (Showing first 1 of 1 rows)