daft.DataFrame.count

daft.DataFrame.count#

DataFrame.count(*cols: Union[Expression, str]) DataFrame[source]#

Performs a global count on the DataFrame

If no columns are specified (i.e. in the case you call df.count()), or only the literal string “*”, this functions very similarly to a COUNT(*) operation in SQL and will return a new dataframe with a single column with the name “count”.

>>> import daft
>>> from daft import col
>>> df = daft.from_pydict({"foo": [1, None, None], "bar": [None, 2, 2], "baz": [3, 4, 5]})
>>> df.count().show()  # equivalent to df.count("*").show()
╭────────╮
│ count  │
│ ---    │
│ UInt64 │
╞════════╡
│ 3      │
╰────────╯

(Showing first 1 of 1 rows)

However, specifying some column names would instead change the behavior to count all non-null values, similar to a SQL command for SELECT COUNT(foo), COUNT(bar) FROM df. Also, using df.count(col("*")) will expand out into count() for each column.

>>> df.count("foo", "bar").show()
╭────────┬────────╮
│ foo    ┆ bar    │
│ ---    ┆ ---    │
│ UInt64 ┆ UInt64 │
╞════════╪════════╡
│ 1      ┆ 2      │
╰────────┴────────╯

(Showing first 1 of 1 rows)
>>> df.count(col("*")).show()
╭────────┬────────┬────────╮
│ foo    ┆ bar    ┆ baz    │
│ ---    ┆ ---    ┆ ---    │
│ UInt64 ┆ UInt64 ┆ UInt64 │
╞════════╪════════╪════════╡
│ 1      ┆ 2      ┆ 3      │
╰────────┴────────┴────────╯

(Showing first 1 of 1 rows)
Parameters:

*cols (Union[str, Expression]) – columns to count

Returns:

Globally aggregated count. Should be a single row.

Return type:

DataFrame