Data Types

Contents

Data Types#

class DataType[source]#

A Daft DataType defines the type of all the values in an Expression or DataFrame column.

classmethod binary() DataType[source]#

Create a Binary DataType: A string of bytes.

classmethod bool() DataType[source]#

Create the Boolean DataType: Either True or False.

classmethod date() DataType[source]#

Create a Date DataType: A date with a year, month and day.

classmethod decimal128(precision: int, scale: int) DataType[source]#

Fixed-precision decimal.

property dtype: DataType#

If the datatype contains an inner type, return the inner type, otherwise an attribute error is raised.

Example

>>> import daft
>>> dtype = daft.DataType.list(daft.DataType.int64())
>>> assert dtype.dtype == daft.DataType.int64()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.dtype
... except AttributeError:
...     pass
classmethod duration(timeunit: daft.datatype.TimeUnit | str) DataType[source]#

Duration DataType.

classmethod embedding(dtype: DataType, size: int) DataType[source]#

Create an Embedding DataType: embeddings are fixed size arrays, where each element in the array has a numeric dtype and each array has a fixed length of size.

Parameters:
  • dtype – DataType of each element in the list (must be numeric)

  • size – length of each list

property fields: dict[str, daft.datatype.DataType]#

If this is a struct type, return the fields, otherwise an attribute error is raised.

Example

>>> import daft
>>> dtype = daft.DataType.struct({"a": daft.DataType.int64()})
>>> fields = dtype.fields
>>> assert fields["a"] == daft.DataType.int64()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.fields
... except AttributeError:
...     pass
classmethod fixed_size_binary(size: int) DataType[source]#

Create a FixedSizeBinary DataType: A fixed-size string of bytes.

classmethod fixed_size_list(dtype: DataType, size: int) DataType[source]#

Create a FixedSizeList DataType: Fixed-size list, where each element in the list has type dtype and each list has length size.

Parameters:
  • dtype – DataType of each element in the list

  • size – length of each list

classmethod float32() DataType[source]#

Create a 32-bit float DataType.

classmethod float64() DataType[source]#

Create a 64-bit float DataType.

classmethod from_arrow_type(arrow_type: DataType) DataType[source]#

Maps a PyArrow DataType to a Daft DataType.

classmethod from_numpy_dtype(np_type: np.dtype) DataType[source]#

Maps a Numpy datatype to a Daft DataType.

classmethod image(mode: Optional[Union[str, ImageMode]] = None, height: Optional[int] = None, width: Optional[int] = None) DataType[source]#

Create an Image DataType: image arrays contain (height, width, channel) ndarrays of pixel values.

Each image in the array has an ImageMode, which describes the pixel dtype (e.g. uint8) and the number of image channels/bands and their logical interpretation (e.g. RGB).

If the height, width, and mode are the same for all images in the array, specifying them when constructing this type is advised, since that will allow Daft to create a more optimized physical representation of the image array.

If the height, width, or mode may vary across images in the array, leaving these fields unspecified when creating this type will cause Daft to represent this image array as a heterogeneous collection of images, where each image can have a different mode, height, and width. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

Parameters:
  • mode – The mode of the image. By default, this is inferred from the underlying data. If height and width are specified, the mode must also be specified.

  • height – The height of the image. By default, this is inferred from the underlying data. Must be specified if the width is specified.

  • width – The width of the image. By default, this is inferred from the underlying data. Must be specified if the width is specified.

property image_mode: daft.daft.ImageMode | None#

If this is an image type, return the (optional) image mode, otherwise an attribute error is raised.

Example

>>> import daft
>>> dtype = daft.DataType.image(mode="RGB")
>>> assert dtype.image_mode == daft.ImageMode.RGB
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.image_mode
... except AttributeError:
...     pass
classmethod int16() DataType[source]#

Create an 16-bit integer DataType.

classmethod int32() DataType[source]#

Create an 32-bit integer DataType.

classmethod int64() DataType[source]#

Create an 64-bit integer DataType.

classmethod int8() DataType[source]#

Create an 8-bit integer DataType.

classmethod interval() DataType[source]#

Interval DataType.

is_binary() builtins.bool[source]#

Check if this is a binary type.

Example

>>> import daft
>>> dtype = daft.DataType.binary()
>>> assert dtype.is_binary()
is_boolean() builtins.bool[source]#

Check if this is a boolean type.

Example

>>> import daft
>>> dtype = daft.DataType.bool()
>>> assert dtype.is_boolean()
is_date() builtins.bool[source]#

Check if this is a date type.

Example

>>> import daft
>>> dtype = daft.DataType.date()
>>> assert dtype.is_date()
is_decimal128() builtins.bool[source]#

Check if this is a decimal128 type.

Example

>>> import daft
>>> dtype = daft.DataType.decimal128(precision=10, scale=2)
>>> assert dtype.is_decimal128()
is_duration() builtins.bool[source]#

Check if this is a duration type.

Example

>>> import daft
>>> dtype = daft.DataType.duration(timeunit="ns")
>>> assert dtype.is_duration()
is_embedding() builtins.bool[source]#

Check if this is an embedding type.

Example

>>> import daft
>>> dtype = daft.DataType.embedding(daft.DataType.float32(), 512)
>>> assert dtype.is_embedding()
is_extension() builtins.bool[source]#

Check if this is an extension type.

Example

>>> import daft
>>> dtype = daft.DataType.extension("custom", daft.DataType.int64())
>>> assert dtype.is_extension()
is_fixed_shape_image() builtins.bool[source]#

Check if this is a fixed shape image type.

Example

>>> import daft
>>> dtype = daft.DataType.image(mode="RGB", height=224, width=224)
>>> assert dtype.is_fixed_shape_image()
is_fixed_shape_sparse_tensor() builtins.bool[source]#

Check if this is a fixed shape sparse tensor type.

Example

>>> import daft
>>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32(), shape=(2, 3))
>>> assert dtype.is_fixed_shape_sparse_tensor()
is_fixed_shape_tensor() builtins.bool[source]#

Check if this is a fixed shape tensor type.

Example

>>> import daft
>>> dtype = daft.DataType.tensor(daft.DataType.float32(), shape=(2, 3))
>>> assert dtype.is_fixed_shape_tensor()
is_fixed_size_binary() builtins.bool[source]#

Check if this is a fixed size binary type.

Example

>>> import daft
>>> dtype = daft.DataType.fixed_size_binary(size=10)
>>> assert dtype.is_fixed_size_binary()
is_fixed_size_list() builtins.bool[source]#

Check if this is a fixed size list type.

Example

>>> import daft
>>> dtype = daft.DataType.fixed_size_list(daft.DataType.int64(), size=10)
>>> assert dtype.is_fixed_size_list()
is_float32() builtins.bool[source]#

Check if this is a 32-bit float type.

Example

>>> import daft
>>> dtype = daft.DataType.float32()
>>> assert dtype.is_float32()
is_float64() builtins.bool[source]#

Check if this is a 64-bit float type.

Example

>>> import daft
>>> dtype = daft.DataType.float64()
>>> assert dtype.is_float64()
is_image() builtins.bool[source]#

Check if this is an image type.

Example

>>> import daft
>>> dtype = daft.DataType.image()
>>> assert dtype.is_image()
is_int16() builtins.bool[source]#

Check if this is a 16-bit integer type.

Example

>>> import daft
>>> dtype = daft.DataType.int16()
>>> assert dtype.is_int16()
is_int32() builtins.bool[source]#

Check if this is a 32-bit integer type.

Example

>>> import daft
>>> dtype = daft.DataType.int32()
>>> assert dtype.is_int32()
is_int64() builtins.bool[source]#

Check if this is a 64-bit integer type.

Example

>>> import daft
>>> dtype = daft.DataType.int64()
>>> assert dtype.is_int64()
is_int8() builtins.bool[source]#

Check if this is an 8-bit integer type.

Example

>>> import daft
>>> dtype = daft.DataType.int8()
>>> assert dtype.is_int8()
is_integer() builtins.bool[source]#

Check if this is an integer type.

Example

>>> import daft
>>> dtype = daft.DataType.int64()
>>> assert dtype.is_integer()
is_interval() builtins.bool[source]#

Check if this is an interval type.

Example

>>> import daft
>>> dtype = daft.DataType.interval()
>>> assert dtype.is_interval()
is_list() builtins.bool[source]#

Check if this is a list type.

Example

>>> import daft
>>> dtype = daft.DataType.list(daft.DataType.int64())
>>> assert dtype.is_list()
is_logical() builtins.bool[source]#

Check if this is a logical type.

Example

>>> import daft
>>> dtype = daft.DataType.bool()
>>> assert not dtype.is_logical()
is_map() builtins.bool[source]#

Check if this is a map type.

Example

>>> import daft
>>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
>>> assert dtype.is_map()
is_null() builtins.bool[source]#

Check if this is a null type.

Example

>>> import daft
>>> dtype = daft.DataType.null()
>>> dtype.is_null()
True
is_numeric() builtins.bool[source]#

Check if this is a numeric type.

Example

>>> import daft
>>> dtype = daft.DataType.float64()
>>> assert dtype.is_numeric()
is_python() builtins.bool[source]#

Check if this is a python object type.

Example

>>> import daft
>>> dtype = daft.DataType.python()
>>> assert dtype.is_python()
is_sparse_tensor() builtins.bool[source]#

Check if this is a sparse tensor type.

Example

>>> import daft
>>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32())
>>> assert dtype.is_sparse_tensor()
is_string() builtins.bool[source]#

Check if this is a string type.

Example

>>> import daft
>>> dtype = daft.DataType.string()
>>> assert dtype.is_string()
is_struct() builtins.bool[source]#

Check if this is a struct type.

Example

>>> import daft
>>> dtype = daft.DataType.struct({"a": daft.DataType.int64()})
>>> assert dtype.is_struct()
is_temporal() builtins.bool[source]#

Check if this is a temporal type.

Example

>>> import daft
>>> dtype = daft.DataType.timestamp(timeunit="ns")
>>> assert dtype.is_temporal()
is_tensor() builtins.bool[source]#

Check if this is a tensor type.

Example

>>> import daft
>>> dtype = daft.DataType.tensor(daft.DataType.float32())
>>> assert dtype.is_tensor()
is_time() builtins.bool[source]#

Check if this is a time type.

Example

>>> import daft
>>> dtype = daft.DataType.time(timeunit="ns")
>>> assert dtype.is_time()
is_timestamp() builtins.bool[source]#

Check if this is a timestamp type.

Example

>>> import daft
>>> dtype = daft.DataType.timestamp(timeunit="ns")
>>> assert dtype.is_timestamp()
is_uint16() builtins.bool[source]#

Check if this is an unsigned 16-bit integer type.

Example

>>> import daft
>>> dtype = daft.DataType.uint16()
>>> assert dtype.is_uint16()
is_uint32() builtins.bool[source]#

Check if this is an unsigned 32-bit integer type.

Example

>>> import daft
>>> dtype = daft.DataType.uint32()
>>> assert dtype.is_uint32()
is_uint64() builtins.bool[source]#

Check if this is an unsigned 64-bit integer type.

Example

>>> import daft
>>> dtype = daft.DataType.uint64()
>>> assert dtype.is_uint64()
is_uint8() builtins.bool[source]#

Check if this is an unsigned 8-bit integer type.

Example

>>> import daft
>>> dtype = daft.DataType.uint8()
>>> assert dtype.is_uint8()
property key_type: DataType#

If this is a map type, return the key type, otherwise an attribute error is raised.

Example

>>> import daft
>>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
>>> assert dtype.key_type == daft.DataType.string()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.key_type
... except AttributeError:
...     pass
classmethod list(dtype: DataType) DataType[source]#

Create a List DataType: Variable-length list, where each element in the list has type dtype.

Parameters:

dtype – DataType of each element in the list

classmethod map(key_type: DataType, value_type: DataType) DataType[source]#

Create a Map DataType: A map is a nested type of key-value pairs that is implemented as a list of structs with two fields, key and value.

Parameters:
  • key_type – DataType of the keys in the map

  • value_type – DataType of the values in the map

classmethod null() DataType[source]#

Creates the Null DataType: Always the Null value.

property precision: int#

If this is a decimal type, return the precision, otherwise an attribute error is raised.

Example

>>> import daft
>>> dtype = daft.DataType.decimal128(precision=10, scale=2)
>>> assert dtype.precision == 10
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.precision
... except AttributeError:
...     pass
classmethod python() DataType[source]#

Create a Python DataType: a type which refers to an arbitrary Python object.

property scale: int#

If this is a decimal type, return the scale, otherwise an attribute error is raised.

Example

>>> import daft
>>> dtype = daft.DataType.decimal128(precision=10, scale=2)
>>> assert dtype.scale == 2
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.precision
... except AttributeError:
...     pass
property shape: tuple[int, ...]#

If this is a fixed shape type, return the shape, otherwise an attribute error is raised.

Example

>>> import daft
>>> dtype = daft.DataType.tensor(daft.DataType.float32(), shape=(2, 3))
>>> assert dtype.shape == (2, 3)
>>> dtype = daft.DataType.tensor(daft.DataType.float32())
>>> try:
...     dtype.shape
... except AttributeError:
...     pass
property size: int#

If this is a fixed size type, return the size, otherwise an attribute error is raised.

Example

>>> import daft
>>> dtype = daft.DataType.fixed_size_binary(size=10)
>>> assert dtype.size == 10
>>> dtype = daft.DataType.binary()
>>> try:
...     dtype.size
... except AttributeError:
...     pass
classmethod sparse_tensor(dtype: DataType, shape: tuple[int, ...] | None = None, use_offset_indices: builtins.bool = False) DataType[source]#

Create a SparseTensor DataType: SparseTensor arrays implemented as ‘COO Sparse Tensor’ representation of n-dimensional arrays of data of the provided dtype as elements, each of the provided shape.

If a shape is given, each ndarray in the column will have this shape.

If shape is not given, the ndarrays in the column can have different shapes. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

The use_offset_indices parameter determines how the indices of the SparseTensor are stored: - False (default): Indices represent the actual positions of nonzero values. - True: Indices represent the offsets between consecutive nonzero values. This can improve compression efficiency, especially when nonzero values are clustered together, as offsets between them are often zero, making them easier to compress.

Parameters:
  • dtype – The type of the data contained within the tensor elements.

  • shape – The shape of each SparseTensor in the column. This is None by default, which allows the shapes of each tensor element to vary.

  • use_offset_indices – Determines how indices are represented. Defaults to False (storing actual indices). If True, stores offsets between nonzero indices.

classmethod string() DataType[source]#

Create a String DataType: A string of UTF8 characters.

classmethod struct(fields: dict[str, daft.datatype.DataType]) DataType[source]#

Create a Struct DataType: a nested type which has names mapped to child types.

Example

>>> struct_type = DataType.struct({"name": DataType.string(), "age": DataType.int64()})
Parameters:

fields – Nested fields of the Struct

classmethod tensor(dtype: DataType, shape: Optional[tuple[int, ...]] = None) DataType[source]#

Create a tensor DataType: tensor arrays contain n-dimensional arrays of data of the provided dtype as elements, each of the provided shape.

If a shape is given, each ndarray in the column will have this shape.

If shape is not given, the ndarrays in the column can have different shapes. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

Parameters:
  • dtype – The type of the data contained within the tensor elements.

  • shape – The shape of each tensor in the column. This is None by default, which allows the shapes of each tensor element to vary.

classmethod time(timeunit: daft.datatype.TimeUnit | str) DataType[source]#

Time DataType. Supported timeunits are “us”, “ns”.

classmethod timestamp(timeunit: daft.datatype.TimeUnit | str, timezone: Optional[str] = None) DataType[source]#

Timestamp DataType.

property timeunit: TimeUnit#

If this is a time or timestamp type, return the timeunit, otherwise an attribute error is raised.

Example

>>> import daft
>>> dtype = daft.DataType.time(timeunit="ns")
>>> dtype.timeunit
TimeUnit(ns)
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.timeunit
... except AttributeError:
...     pass
property timezone: str | None#

If this is a timestamp type, return the timezone, otherwise an attribute error is raised.

Example

>>> import daft
>>> dtype = daft.DataType.timestamp(timeunit="ns", timezone="UTC")
>>> assert dtype.timezone == "UTC"
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.time_zone
... except AttributeError:
...     pass
classmethod uint16() DataType[source]#

Create an unsigned 16-bit integer DataType.

classmethod uint32() DataType[source]#

Create an unsigned 32-bit integer DataType.

classmethod uint64() DataType[source]#

Create an unsigned 64-bit integer DataType.

classmethod uint8() DataType[source]#

Create an unsigned 8-bit integer DataType.

property use_offset_indices: builtins.bool#

If this is a sparse tensor type, return whether the indices are stored as offsets, otherwise an attribute error is raised.

Example

>>> import daft
>>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32(), use_offset_indices=True)
>>> assert dtype.use_offset_indices
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.use_offset_indices
... except AttributeError:
...     pass
property value_type: DataType#

If this is a map type, return the value type, otherwise an attribute error is raised.

Example

>>> import daft
>>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
>>> assert dtype.value_type == daft.DataType.int64()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.value_type
... except AttributeError:
...     pass