daft.read_parquet#

daft.read_parquet(path: Union[str, List[str]], schema_hints: Optional[Dict[str, daft.datatype.DataType]] = None, fs: Optional[fsspec.spec.AbstractFileSystem] = None, io_config: Optional[IOConfig] = None, use_native_downloader: bool = False) daft.dataframe.dataframe.DataFrame[source]#

Creates a DataFrame from Parquet file(s)

Example

>>> df = daft.read_parquet("/path/to/file.parquet")
>>> df = daft.read_parquet("/path/to/directory")
>>> df = daft.read_parquet("/path/to/files-*.parquet")
>>> df = daft.read_parquet("s3://path/to/files-*.parquet")
Parameters
  • path (str) – Path to Parquet file (allows for wildcards)

  • schema_hints (dict[str, DataType]) – A mapping between column names and datatypes - passing this option will disable all schema inference on data being read, and throw an error if data being read is incompatible.

  • fs (fsspec.AbstractFileSystem) – fsspec FileSystem to use for reading data. By default, Daft will automatically construct a FileSystem instance internally.

  • io_config (IOConfig) – Config to be used with the native downloader

  • use_native_downloader – Whether to use the native downloader instead of PyArrow for reading Parquet. This is currently experimental.

Returns

parsed DataFrame

Return type

DataFrame