daft.from_glob_path#

daft.from_glob_path(path: str, fs: Optional[fsspec.spec.AbstractFileSystem] = None) daft.dataframe.dataframe.DataFrame[source]#

Creates a DataFrame of file paths and other metadata from a glob path.

This method supports wildcards:

  1. “*” matches any number of any characters including none

  2. “?” matches any single character

  3. “[…]” matches any single character in the brackets

  4. “**” recursively matches any number of layers of directories

The returned DataFrame will have the following columns:

  1. path: the path to the file/directory

  2. size: size of the object in bytes

  3. type: either “file” or “directory”

Example

>>> df = daft.from_glob_path("/path/to/files/*.jpeg")
>>> df = daft.from_glob_path("/path/to/files/**/*.jpeg")
>>> df = daft.from_glob_path("/path/to/files/**/image-?.jpeg")
Parameters
  • path (str) – Path to files on disk (allows wildcards).

  • fs (fsspec.AbstractFileSystem) – fsspec FileSystem to use for globbing and fetching metadata. By default, Daft will automatically construct a FileSystem instance internally.

Returns

DataFrame containing the path to each file as a row, along with other metadata

parsed from the provided filesystem.

Return type

DataFrame