daft.read_delta_lake

daft.read_delta_lake#

daft.read_delta_lake(table: Union[str, DataCatalogTable], io_config: Optional[IOConfig] = None, _multithreaded_io: Optional[bool] = None) DataFrame[source]#

Create a DataFrame from a Delta Lake table.

Example

>>> df = daft.read_delta_lake("some-table-uri")
>>>
>>> # Filters on this dataframe can now be pushed into
>>> # the read operation from Delta Lake.
>>> df = df.where(df["foo"] > 5)
>>> df.show()

Note

This function requires the use of deltalake, a Python library for interacting with Delta Lake.

Parameters:
  • table – Either a URI for the Delta Lake table or a DataCatalogTable instance referencing a table in a data catalog, such as AWS Glue Data Catalog or Databricks Unity Catalog.

  • io_config – A custom IOConfig to use when accessing Delta Lake object storage data. Defaults to None.

  • _multithreaded_io – Whether to use multithreading for IO threads. Setting this to False can be helpful in reducing the amount of system resources (number of connections and thread contention) when running in the Ray runner. Defaults to None, which will let Daft decide based on the runner it is currently using.

Returns:

A DataFrame with the schema converted from the specified Delta Lake table.

Return type:

DataFrame