daft.DataFrame.repartition

daft.DataFrame.repartition#

DataFrame.repartition(num: Optional[int], *partition_by: Union[Expression, str]) DataFrame[source]#

Repartitions DataFrame to num partitions

If columns are passed in, then DataFrame will be repartitioned by those, otherwise random repartitioning will occur.

Note

This function will globally shuffle your data, which is potentially a very expensive operation.

If instead you merely wish to “split” or “coalesce” partitions to obtain a target number of partitions, you mean instead wish to consider using DataFrame.into_partitions which avoids shuffling of data in favor of splitting/coalescing adjacent partitions where appropriate.

Example

>>> import daft
>>> df = daft.from_pydict({"x": [1, 2, 3], "y": [4, 5, 6], "z": [7, 8, 9]})
>>> repartitioned_df = df.repartition(3)
>>> repartitioned_df.num_partitions()
3
Parameters:
  • num (Optional[int]) – Number of target partitions; if None, the number of partitions will not be changed.

  • *partition_by (Union[str, Expression]) – Optional columns to partition by.

Returns:

Repartitioned DataFrame.

Return type:

DataFrame