daft.DataFrame.to_torch_map_dataset

daft.DataFrame.to_torch_map_dataset#

DataFrame.to_torch_map_dataset() TorchDataset[source]#

Convert the current DataFrame into a map-style Torch Dataset for use with PyTorch.

This method will materialize the entire DataFrame and block on completion.

Items will be returned in pydict format: a dict of {"column name": value} for each row in the data.

Note

If you do not need random access, you may get better performance out of an IterableDataset, which streams data items in as soon as they are ready and does not block on full materialization.

Note

This method returns results locally. For distributed training, you may want to use DataFrame.to_ray_dataset().