DataTypes#

class DataType[source]#

A Daft DataType defines the type of all the values in an Expression or DataFrame column

classmethod binary() DataType[source]#

Create a Binary DataType: A string of bytes

classmethod bool() DataType[source]#

Create the Boolean DataType: Either True or False

classmethod date() DataType[source]#

Create a Date DataType: A date with a year, month and day

classmethod decimal128(precision: int, scale: int) DataType[source]#

Fixed-precision decimal.

classmethod duration(timeunit: daft.datatype.TimeUnit | str) DataType[source]#

Duration DataType.

classmethod embedding(dtype: DataType, size: int) DataType[source]#

Create an Embedding DataType: embeddings are fixed size arrays, where each element in the array has a numeric dtype and each array has a fixed length of size.

Parameters:
  • dtype – DataType of each element in the list (must be numeric)

  • size – length of each list

classmethod fixed_size_binary(size: int) DataType[source]#

Create a FixedSizeBinary DataType: A fixed-size string of bytes

classmethod fixed_size_list(dtype: DataType, size: int) DataType[source]#

Create a FixedSizeList DataType: Fixed-size list, where each element in the list has type dtype and each list has length size.

Parameters:
  • dtype – DataType of each element in the list

  • size – length of each list

classmethod float32() DataType[source]#

Create a 32-bit float DataType

classmethod float64() DataType[source]#

Create a 64-bit float DataType

classmethod from_arrow_type(arrow_type: DataType) DataType[source]#

Maps a PyArrow DataType to a Daft DataType

classmethod from_numpy_dtype(np_type: np.dtype) DataType[source]#

Maps a Numpy datatype to a Daft DataType

classmethod image(mode: Optional[Union[str, ImageMode]] = None, height: Optional[int] = None, width: Optional[int] = None) DataType[source]#

Create an Image DataType: image arrays contain (height, width, channel) ndarrays of pixel values.

Each image in the array has an ImageMode, which describes the pixel dtype (e.g. uint8) and the number of image channels/bands and their logical interpretation (e.g. RGB).

If the height, width, and mode are the same for all images in the array, specifying them when constructing this type is advised, since that will allow Daft to create a more optimized physical representation of the image array.

If the height, width, or mode may vary across images in the array, leaving these fields unspecified when creating this type will cause Daft to represent this image array as a heterogeneous collection of images, where each image can have a different mode, height, and width. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

Parameters:
  • mode – The mode of the image. By default, this is inferred from the underlying data. If height and width are specified, the mode must also be specified.

  • height – The height of the image. By default, this is inferred from the underlying data. Must be specified if the width is specified.

  • width – The width of the image. By default, this is inferred from the underlying data. Must be specified if the width is specified.

classmethod int16() DataType[source]#

Create an 16-bit integer DataType

classmethod int32() DataType[source]#

Create an 32-bit integer DataType

classmethod int64() DataType[source]#

Create an 64-bit integer DataType

classmethod int8() DataType[source]#

Create an 8-bit integer DataType

classmethod interval() DataType[source]#

Interval DataType.

classmethod list(dtype: DataType) DataType[source]#

Create a List DataType: Variable-length list, where each element in the list has type dtype

Parameters:

dtype – DataType of each element in the list

classmethod map(key_type: DataType, value_type: DataType) DataType[source]#

Create a Map DataType: A map is a nested type of key-value pairs that is implemented as a list of structs with two fields, key and value. :param key_type: DataType of the keys in the map :param value_type: DataType of the values in the map

classmethod null() DataType[source]#

Creates the Null DataType: Always the Null value

classmethod python() DataType[source]#

Create a Python DataType: a type which refers to an arbitrary Python object

classmethod sparse_tensor(dtype: DataType, shape: Optional[tuple[int, ...]] = None) DataType[source]#

Create a SparseTensor DataType: SparseTensor arrays implemented as ‘COO Sparse Tensor’ representation of n-dimensional arrays of data of the provided dtype as elements, each of the provided shape.

If a shape is given, each ndarray in the column will have this shape.

If shape is not given, the ndarrays in the column can have different shapes. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

Parameters:
  • dtype – The type of the data contained within the tensor elements.

  • shape – The shape of each SparseTensor in the column. This is None by default, which allows the shapes of each tensor element to vary.

classmethod string() DataType[source]#

Create a String DataType: A string of UTF8 characters

classmethod struct(fields: dict[str, daft.datatype.DataType]) DataType[source]#

Create a Struct DataType: a nested type which has names mapped to child types

Example: >>> DataType.struct({“name”: DataType.string(), “age”: DataType.int64()})

Parameters:

fields – Nested fields of the Struct

classmethod tensor(dtype: DataType, shape: Optional[tuple[int, ...]] = None) DataType[source]#

Create a tensor DataType: tensor arrays contain n-dimensional arrays of data of the provided dtype as elements, each of the provided shape.

If a shape is given, each ndarray in the column will have this shape.

If shape is not given, the ndarrays in the column can have different shapes. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

Parameters:
  • dtype – The type of the data contained within the tensor elements.

  • shape – The shape of each tensor in the column. This is None by default, which allows the shapes of each tensor element to vary.

classmethod time(timeunit: daft.datatype.TimeUnit | str) DataType[source]#

Time DataType. Supported timeunits are “us”, “ns”.

classmethod timestamp(timeunit: daft.datatype.TimeUnit | str, timezone: Optional[str] = None) DataType[source]#

Timestamp DataType.

classmethod uint16() DataType[source]#

Create an unsigned 16-bit integer DataType

classmethod uint32() DataType[source]#

Create an unsigned 32-bit integer DataType

classmethod uint64() DataType[source]#

Create an unsigned 64-bit integer DataType

classmethod uint8() DataType[source]#

Create an unsigned 8-bit integer DataType