daft.DataFrame.groupby

daft.DataFrame.groupby#

DataFrame.groupby(*group_by: Union[Expression, str, Iterable[Union[Expression, str]]]) GroupedDataFrame[source]#

Performs a GroupBy on the DataFrame for aggregation

Example

>>> import daft
>>> from daft import col
>>> df = daft.from_pydict({
...     "pet": ["cat", "dog", "dog", "cat"],
...     "age": [1, 2, 3, 4],
...     "name": ["Alex", "Jordan", "Sam", "Riley"]
... })
>>> grouped_df = df.groupby("pet").agg(
...     col("age").min().alias("min_age"),
...     col("age").max().alias("max_age"),
...     col("pet").count().alias("count"),
...     col("name").any_value()
... )
>>> grouped_df.show()
╭──────┬─────────┬─────────┬────────┬────────╮
│ pet  ┆ min_age ┆ max_age ┆ count  ┆ name   │
│ ---  ┆ ---     ┆ ---     ┆ ---    ┆ ---    │
│ Utf8 ┆ Int64   ┆ Int64   ┆ UInt64 ┆ Utf8   │
╞══════╪═════════╪═════════╪════════╪════════╡
│ cat  ┆ 1       ┆ 4       ┆ 2      ┆ Alex   │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ dog  ┆ 2       ┆ 3       ┆ 2      ┆ Jordan │
╰──────┴─────────┴─────────┴────────┴────────╯

(Showing first 2 of 2 rows)
Parameters:

*group_by (Union[str, Expression]) – columns to group by

Returns:

DataFrame to Aggregate

Return type:

GroupedDataFrame