Input/Output
Contents
Input/Output#
Configuration#
Note
Daft is currently building out its own native code for reading/writing data. These configuration objects allow users to control behavior when Daft runs native code, but otherwise will have no effect.
These configurations are currently used in:
daft.read_parquet()
: controls behavior when reading DataFrames Parquet files using the native downloaderExpression.url.download()
: controls behavior when downloading bytes from URLs using the native downloaderTable.read_parquet
: (Advanced usecase!) controls behavior when reading a Daft Table from a Parquet file
Create configurations to be used when accessing storage |
|
Create configurations to be used when accessing an S3-compatible system |
In-Memory Data#
Python Objects#
Creates a DataFrame from a list of dictionaries. |
|
Creates a DataFrame from a Python dictionary. |
|
Converts the current DataFrame to a python dictionary. |
Arrow#
Creates a DataFrame from a pyarrow Table. |
|
Converts the current DataFrame to a pyarrow Table. |
Pandas#
Creates a Daft DataFrame from a pandas DataFrame. |
|
Converts the current DataFrame to a pandas DataFrame. |
File Paths#
Creates a DataFrame of file paths and other metadata from a glob path. |
Files#
Parquet#
|
|
Writes the DataFrame as parquet files, returning a new DataFrame with paths to the files that were written |
CSV#
|
|
Writes the DataFrame as CSV files, returning a new DataFrame with paths to the files that were written |
JSON#
|
Integrations#
Ray Datasets#
Creates a DataFrame from a Ray Dataset. |
|
Converts the current DataFrame to a Ray Dataset which is useful for running distributed ML model training in Ray |
Dask#
Creates a Daft DataFrame from a Dask DataFrame. |
|
Converts the current Daft DataFrame to a Dask DataFrame. |