daft.Expression.approx_count_distinct

daft.Expression.approx_count_distinct#

Expression.approx_count_distinct() Expression[source]#

Calculates the approximate number of non-NULL unique values in the expression.

Approximation is performed using the HyperLogLog algorithm.

Example

A global calculation of approximate distinct values in a non-NULL column:

>>> import daft
>>> df = daft.from_pydict({"values": [1, 2, 3, None]})
>>> df = df.agg(
...     df["values"].approx_count_distinct().alias("distinct_values"),
... )
>>> df.show()
╭─────────────────╮
│ distinct_values │
│ ---             │
│ UInt64          │
╞═════════════════╡
│ 3               │
╰─────────────────╯

(Showing first 1 of 1 rows)