DataTypes#
- class DataType[source]#
A Daft DataType defines the type of all the values in an Expression or DataFrame column
- classmethod embedding(dtype: DataType, size: int) DataType [source]#
Create an Embedding DataType: embeddings are fixed size arrays, where each element in the array has a numeric
dtype
and each array has a fixed length ofsize
.- Parameters:
dtype – DataType of each element in the list (must be numeric)
size – length of each list
- classmethod fixed_size_binary(size: int) DataType [source]#
Create a FixedSizeBinary DataType: A fixed-size string of bytes
- classmethod fixed_size_list(dtype: DataType, size: int) DataType [source]#
Create a FixedSizeList DataType: Fixed-size list, where each element in the list has type
dtype
and each list has lengthsize
.- Parameters:
dtype – DataType of each element in the list
size – length of each list
- classmethod from_arrow_type(arrow_type: DataType) DataType [source]#
Maps a PyArrow DataType to a Daft DataType
- classmethod from_numpy_dtype(np_type: np.dtype) DataType [source]#
Maps a Numpy datatype to a Daft DataType
- classmethod image(mode: Optional[Union[str, ImageMode]] = None, height: Optional[int] = None, width: Optional[int] = None) DataType [source]#
Create an Image DataType: image arrays contain (height, width, channel) ndarrays of pixel values.
Each image in the array has an
ImageMode
, which describes the pixel dtype (e.g. uint8) and the number of image channels/bands and their logical interpretation (e.g. RGB).If the height, width, and mode are the same for all images in the array, specifying them when constructing this type is advised, since that will allow Daft to create a more optimized physical representation of the image array.
If the height, width, or mode may vary across images in the array, leaving these fields unspecified when creating this type will cause Daft to represent this image array as a heterogeneous collection of images, where each image can have a different mode, height, and width. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.
- Parameters:
mode – The mode of the image. By default, this is inferred from the underlying data. If height and width are specified, the mode must also be specified.
height – The height of the image. By default, this is inferred from the underlying data. Must be specified if the width is specified.
width – The width of the image. By default, this is inferred from the underlying data. Must be specified if the width is specified.
- classmethod list(dtype: DataType) DataType [source]#
Create a List DataType: Variable-length list, where each element in the list has type
dtype
- Parameters:
dtype – DataType of each element in the list
- classmethod map(key_type: DataType, value_type: DataType) DataType [source]#
Create a Map DataType: A map is a nested type of key-value pairs that is implemented as a list of structs with two fields, key and value. :param key_type: DataType of the keys in the map :param value_type: DataType of the values in the map
- classmethod python() DataType [source]#
Create a Python DataType: a type which refers to an arbitrary Python object
- classmethod sparse_tensor(dtype: DataType, shape: Optional[tuple[int, ...]] = None) DataType [source]#
Create a SparseTensor DataType: SparseTensor arrays implemented as ‘COO Sparse Tensor’ representation of n-dimensional arrays of data of the provided
dtype
as elements, each of the providedshape
.If a
shape
is given, each ndarray in the column will have this shape.If
shape
is not given, the ndarrays in the column can have different shapes. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.- Parameters:
dtype – The type of the data contained within the tensor elements.
shape – The shape of each SparseTensor in the column. This is
None
by default, which allows the shapes of each tensor element to vary.
- classmethod struct(fields: dict[str, daft.datatype.DataType]) DataType [source]#
Create a Struct DataType: a nested type which has names mapped to child types
Example: >>> DataType.struct({“name”: DataType.string(), “age”: DataType.int64()})
- Parameters:
fields – Nested fields of the Struct
- classmethod tensor(dtype: DataType, shape: Optional[tuple[int, ...]] = None) DataType [source]#
Create a tensor DataType: tensor arrays contain n-dimensional arrays of data of the provided
dtype
as elements, each of the providedshape
.If a
shape
is given, each ndarray in the column will have this shape.If
shape
is not given, the ndarrays in the column can have different shapes. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.- Parameters:
dtype – The type of the data contained within the tensor elements.
shape – The shape of each tensor in the column. This is
None
by default, which allows the shapes of each tensor element to vary.
- classmethod time(timeunit: daft.datatype.TimeUnit | str) DataType [source]#
Time DataType. Supported timeunits are “us”, “ns”.