daft.DataFrame.repartition#
- DataFrame.repartition(num: Optional[int], *partition_by: Union[Expression, str]) DataFrame [source]#
Repartitions DataFrame to
num
partitionsIf columns are passed in, then DataFrame will be repartitioned by those, otherwise random repartitioning will occur.
Note
This function will globally shuffle your data, which is potentially a very expensive operation.
If instead you merely wish to “split” or “coalesce” partitions to obtain a target number of partitions, you mean instead wish to consider using
DataFrame.into_partitions
which avoids shuffling of data in favor of splitting/coalescing adjacent partitions where appropriate.Example
>>> import daft >>> df = daft.from_pydict({"x": [1, 2, 3], "y": [4, 5, 6], "z": [7, 8, 9]}) >>> repartitioned_df = df.repartition(3) >>> repartitioned_df.num_partitions() 3
- Parameters:
num (Optional[int]) – Number of target partitions; if None, the number of partitions will not be changed.
*partition_by (Union[str, Expression]) – Optional columns to partition by.
- Returns:
Repartitioned DataFrame.
- Return type: