daft.DataFrame.to_dask_dataframe

daft.DataFrame.to_dask_dataframe#

DataFrame.to_dask_dataframe(meta: Optional[Union[pd.DataFrame, pd.Series, Dict[str, Any], Iterable[Any], Tuple[Any]]] = None) dask.DataFrame[source]#

Converts the current Daft DataFrame to a Dask DataFrame.

The returned Dask DataFrame will use Dask-on-Ray to execute operations on a Ray cluster.

Note

This function can only work if Daft is running using the RayRunner.

Parameters:

meta – An empty pandas DataFrame or Series that matches the dtypes and column names of the stream. This metadata is necessary for many algorithms in dask dataframe to work. For ease of use, some alternative inputs are also available. Instead of a DataFrame, a dict of {name: dtype} or iterable of (name, dtype) can be provided (note that the order of the names should match the order of the columns). Instead of a series, a tuple of (name, dtype) can be used. By default, this will be inferred from the underlying Daft DataFrame schema, with this argument supplying an optional override.

Returns:

A Dask DataFrame stored on a Ray cluster.

Return type:

dask.DataFrame