Input/Output#

Configuration#

Note

Daft is currently building out its own native code for reading/writing data. These configuration objects allow users to control behavior when Daft runs native code, but otherwise will have no effect.

These configurations are currently used in:

  1. daft.read_parquet(): controls behavior when reading DataFrames Parquet files using the native downloader

  2. Expression.url.download(): controls behavior when downloading bytes from URLs using the native downloader

  3. Table.read_parquet: (Advanced usecase!) controls behavior when reading a Daft Table from a Parquet file

daft.io.IOConfig

Create configurations to be used when accessing storage

daft.io.S3Config

Create configurations to be used when accessing an S3-compatible system

In-Memory Data#

Python Objects#

daft.from_pylist

Creates a DataFrame from a list of dictionaries.

daft.from_pydict

Creates a DataFrame from a Python dictionary.

daft.DataFrame.to_pydict

Converts the current DataFrame to a python dictionary.

Arrow#

daft.from_arrow

Creates a DataFrame from a pyarrow Table.

daft.DataFrame.to_arrow

Converts the current DataFrame to a pyarrow Table.

Pandas#

daft.from_pandas

Creates a Daft DataFrame from a pandas DataFrame.

daft.DataFrame.to_pandas

Converts the current DataFrame to a pandas DataFrame.

File Paths#

daft.from_glob_path

Creates a DataFrame of file paths and other metadata from a glob path.

Files#

Parquet#

daft.read_parquet

daft.DataFrame.write_parquet

Writes the DataFrame as parquet files, returning a new DataFrame with paths to the files that were written

CSV#

daft.read_csv

daft.DataFrame.write_csv

Writes the DataFrame as CSV files, returning a new DataFrame with paths to the files that were written

JSON#

daft.read_json

Integrations#

Ray Datasets#

daft.from_ray_dataset

Creates a DataFrame from a Ray Dataset.

daft.DataFrame.to_ray_dataset

Converts the current DataFrame to a Ray Dataset which is useful for running distributed ML model training in Ray

Dask#

daft.from_dask_dataframe

Creates a Daft DataFrame from a Dask DataFrame.

daft.DataFrame.to_dask_dataframe

Converts the current Daft DataFrame to a Dask DataFrame.