daft.DataFrame.to_torch_iter_dataset

daft.DataFrame.to_torch_iter_dataset#

DataFrame.to_torch_iter_dataset() TorchIterableDataset[source]#

Convert the current DataFrame into a Torch IterableDataset for use with PyTorch.

Begins execution of the DataFrame if it is not yet executed.

Items will be returned in pydict format: a dict of {"column name": value} for each row in the data.

Note

The produced dataset is meant to be used with the single-process DataLoader, and does not support data sharding hooks for multi-process data loading.

Do keep in mind that Daft is already using multithreading or multiprocessing under the hood to compute the data stream that feeds this dataset.

Note

This method returns results locally. For distributed training, you may want to use DataFrame.to_ray_dataset().