daft.DataFrame.pivot

daft.DataFrame.pivot#

DataFrame.pivot(group_by: Union[Expression, str, Iterable[Union[Expression, str]]], pivot_col: Union[Expression, str], value_col: Union[Expression, str], agg_fn: str, names: Optional[List[str]] = None) DataFrame[source]#

Pivots a column of the DataFrame and performs an aggregation on the values.

Note

You may wish to provide a list of distinct values to pivot on, which is more efficient as it avoids a distinct operation. Without this list, Daft will perform a distinct operation on the pivot column to determine the unique values to pivot on.

Example

>>> import daft
>>> data = {
...     "id": [1, 2, 3, 4],
...     "version": ["3.8", "3.8", "3.9", "3.9"],
...     "platform": ["macos", "macos", "macos", "windows"],
...     "downloads": [100, 200, 150, 250],
... }
>>> df = daft.from_pydict(data)
>>> df = df.pivot("version", "platform", "downloads", "sum")
>>> df.show()
╭─────────┬─────────┬───────╮
│ version ┆ windows ┆ macos │
│ ---     ┆ ---     ┆ ---   │
│ Utf8    ┆ Int64   ┆ Int64 │
╞═════════╪═════════╪═══════╡
│ 3.9     ┆ 250     ┆ 150   │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 3.8     ┆ None    ┆ 300   │
╰─────────┴─────────┴───────╯

(Showing first 2 of 2 rows)
Parameters:
  • group_by (ManyColumnsInputType) – columns to group by

  • pivot_col (Union[str, Expression]) – column to pivot

  • value_col (Union[str, Expression]) – column to aggregate

  • agg_fn (str) – aggregation function to apply

  • names (Optional[List[str]]) – names of the pivoted columns

Returns:

DataFrame with pivoted columns

Return type:

DataFrame