daft.DataFrame.count#
- DataFrame.count(*cols: Union[Expression, str]) DataFrame [source]#
Performs a global count on the DataFrame
If no columns are specified (i.e. in the case you call
df.count()
), or only the literal string “*”, this functions very similarly to a COUNT(*) operation in SQL and will return a new dataframe with a single column with the name “count”.>>> import daft >>> from daft import col >>> df = daft.from_pydict({"foo": [1, None, None], "bar": [None, 2, 2], "baz": [3, 4, 5]}) >>> df.count().show() # equivalent to df.count("*").show() ╭────────╮ │ count │ │ --- │ │ UInt64 │ ╞════════╡ │ 3 │ ╰────────╯ (Showing first 1 of 1 rows)
However, specifying some column names would instead change the behavior to count all non-null values, similar to a SQL command for
SELECT COUNT(foo), COUNT(bar) FROM df
. Also, usingdf.count(col("*"))
will expand out into count() for each column.>>> df.count("foo", "bar").show() ╭────────┬────────╮ │ foo ┆ bar │ │ --- ┆ --- │ │ UInt64 ┆ UInt64 │ ╞════════╪════════╡ │ 1 ┆ 2 │ ╰────────┴────────╯ (Showing first 1 of 1 rows) >>> df.count(col("*")).show() ╭────────┬────────┬────────╮ │ foo ┆ bar ┆ baz │ │ --- ┆ --- ┆ --- │ │ UInt64 ┆ UInt64 ┆ UInt64 │ ╞════════╪════════╪════════╡ │ 1 ┆ 2 ┆ 3 │ ╰────────┴────────┴────────╯ (Showing first 1 of 1 rows)
- Parameters:
*cols (Union[str, Expression]) – columns to count
- Returns:
Globally aggregated count. Should be a single row.
- Return type: