Data Types#
- class DataType[source]#
A Daft DataType defines the type of all the values in an Expression or DataFrame column.
- property dtype: DataType#
If the datatype contains an inner type, return the inner type, otherwise an attribute error is raised.
Example
>>> import daft >>> dtype = daft.DataType.list(daft.DataType.int64()) >>> assert dtype.dtype == daft.DataType.int64() >>> dtype = daft.DataType.int64() >>> try: ... dtype.dtype ... except AttributeError: ... pass
- classmethod embedding(dtype: DataType, size: int) DataType [source]#
Create an Embedding DataType: embeddings are fixed size arrays, where each element in the array has a numeric
dtype
and each array has a fixed length ofsize
.- Parameters:
dtype – DataType of each element in the list (must be numeric)
size – length of each list
- property fields: dict[str, daft.datatype.DataType]#
If this is a struct type, return the fields, otherwise an attribute error is raised.
Example
>>> import daft >>> dtype = daft.DataType.struct({"a": daft.DataType.int64()}) >>> fields = dtype.fields >>> assert fields["a"] == daft.DataType.int64() >>> dtype = daft.DataType.int64() >>> try: ... dtype.fields ... except AttributeError: ... pass
- classmethod fixed_size_binary(size: int) DataType [source]#
Create a FixedSizeBinary DataType: A fixed-size string of bytes.
- classmethod fixed_size_list(dtype: DataType, size: int) DataType [source]#
Create a FixedSizeList DataType: Fixed-size list, where each element in the list has type
dtype
and each list has lengthsize
.- Parameters:
dtype – DataType of each element in the list
size – length of each list
- classmethod from_arrow_type(arrow_type: DataType) DataType [source]#
Maps a PyArrow DataType to a Daft DataType.
- classmethod from_numpy_dtype(np_type: np.dtype) DataType [source]#
Maps a Numpy datatype to a Daft DataType.
- classmethod image(mode: Optional[Union[str, ImageMode]] = None, height: Optional[int] = None, width: Optional[int] = None) DataType [source]#
Create an Image DataType: image arrays contain (height, width, channel) ndarrays of pixel values.
Each image in the array has an
ImageMode
, which describes the pixel dtype (e.g. uint8) and the number of image channels/bands and their logical interpretation (e.g. RGB).If the height, width, and mode are the same for all images in the array, specifying them when constructing this type is advised, since that will allow Daft to create a more optimized physical representation of the image array.
If the height, width, or mode may vary across images in the array, leaving these fields unspecified when creating this type will cause Daft to represent this image array as a heterogeneous collection of images, where each image can have a different mode, height, and width. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.
- Parameters:
mode – The mode of the image. By default, this is inferred from the underlying data. If height and width are specified, the mode must also be specified.
height – The height of the image. By default, this is inferred from the underlying data. Must be specified if the width is specified.
width – The width of the image. By default, this is inferred from the underlying data. Must be specified if the width is specified.
- property image_mode: daft.daft.ImageMode | None#
If this is an image type, return the (optional) image mode, otherwise an attribute error is raised.
Example
>>> import daft >>> dtype = daft.DataType.image(mode="RGB") >>> assert dtype.image_mode == daft.ImageMode.RGB >>> dtype = daft.DataType.int64() >>> try: ... dtype.image_mode ... except AttributeError: ... pass
- is_binary() builtins.bool [source]#
Check if this is a binary type.
Example
>>> import daft >>> dtype = daft.DataType.binary() >>> assert dtype.is_binary()
- is_boolean() builtins.bool [source]#
Check if this is a boolean type.
Example
>>> import daft >>> dtype = daft.DataType.bool() >>> assert dtype.is_boolean()
- is_date() builtins.bool [source]#
Check if this is a date type.
Example
>>> import daft >>> dtype = daft.DataType.date() >>> assert dtype.is_date()
- is_decimal128() builtins.bool [source]#
Check if this is a decimal128 type.
Example
>>> import daft >>> dtype = daft.DataType.decimal128(precision=10, scale=2) >>> assert dtype.is_decimal128()
- is_duration() builtins.bool [source]#
Check if this is a duration type.
Example
>>> import daft >>> dtype = daft.DataType.duration(timeunit="ns") >>> assert dtype.is_duration()
- is_embedding() builtins.bool [source]#
Check if this is an embedding type.
Example
>>> import daft >>> dtype = daft.DataType.embedding(daft.DataType.float32(), 512) >>> assert dtype.is_embedding()
- is_extension() builtins.bool [source]#
Check if this is an extension type.
Example
>>> import daft >>> dtype = daft.DataType.extension("custom", daft.DataType.int64()) >>> assert dtype.is_extension()
- is_fixed_shape_image() builtins.bool [source]#
Check if this is a fixed shape image type.
Example
>>> import daft >>> dtype = daft.DataType.image(mode="RGB", height=224, width=224) >>> assert dtype.is_fixed_shape_image()
- is_fixed_shape_sparse_tensor() builtins.bool [source]#
Check if this is a fixed shape sparse tensor type.
Example
>>> import daft >>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32(), shape=(2, 3)) >>> assert dtype.is_fixed_shape_sparse_tensor()
- is_fixed_shape_tensor() builtins.bool [source]#
Check if this is a fixed shape tensor type.
Example
>>> import daft >>> dtype = daft.DataType.tensor(daft.DataType.float32(), shape=(2, 3)) >>> assert dtype.is_fixed_shape_tensor()
- is_fixed_size_binary() builtins.bool [source]#
Check if this is a fixed size binary type.
Example
>>> import daft >>> dtype = daft.DataType.fixed_size_binary(size=10) >>> assert dtype.is_fixed_size_binary()
- is_fixed_size_list() builtins.bool [source]#
Check if this is a fixed size list type.
Example
>>> import daft >>> dtype = daft.DataType.fixed_size_list(daft.DataType.int64(), size=10) >>> assert dtype.is_fixed_size_list()
- is_float32() builtins.bool [source]#
Check if this is a 32-bit float type.
Example
>>> import daft >>> dtype = daft.DataType.float32() >>> assert dtype.is_float32()
- is_float64() builtins.bool [source]#
Check if this is a 64-bit float type.
Example
>>> import daft >>> dtype = daft.DataType.float64() >>> assert dtype.is_float64()
- is_image() builtins.bool [source]#
Check if this is an image type.
Example
>>> import daft >>> dtype = daft.DataType.image() >>> assert dtype.is_image()
- is_int16() builtins.bool [source]#
Check if this is a 16-bit integer type.
Example
>>> import daft >>> dtype = daft.DataType.int16() >>> assert dtype.is_int16()
- is_int32() builtins.bool [source]#
Check if this is a 32-bit integer type.
Example
>>> import daft >>> dtype = daft.DataType.int32() >>> assert dtype.is_int32()
- is_int64() builtins.bool [source]#
Check if this is a 64-bit integer type.
Example
>>> import daft >>> dtype = daft.DataType.int64() >>> assert dtype.is_int64()
- is_int8() builtins.bool [source]#
Check if this is an 8-bit integer type.
Example
>>> import daft >>> dtype = daft.DataType.int8() >>> assert dtype.is_int8()
- is_integer() builtins.bool [source]#
Check if this is an integer type.
Example
>>> import daft >>> dtype = daft.DataType.int64() >>> assert dtype.is_integer()
- is_interval() builtins.bool [source]#
Check if this is an interval type.
Example
>>> import daft >>> dtype = daft.DataType.interval() >>> assert dtype.is_interval()
- is_list() builtins.bool [source]#
Check if this is a list type.
Example
>>> import daft >>> dtype = daft.DataType.list(daft.DataType.int64()) >>> assert dtype.is_list()
- is_logical() builtins.bool [source]#
Check if this is a logical type.
Example
>>> import daft >>> dtype = daft.DataType.bool() >>> assert not dtype.is_logical()
- is_map() builtins.bool [source]#
Check if this is a map type.
Example
>>> import daft >>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64()) >>> assert dtype.is_map()
- is_null() builtins.bool [source]#
Check if this is a null type.
Example
>>> import daft >>> dtype = daft.DataType.null() >>> dtype.is_null() True
- is_numeric() builtins.bool [source]#
Check if this is a numeric type.
Example
>>> import daft >>> dtype = daft.DataType.float64() >>> assert dtype.is_numeric()
- is_python() builtins.bool [source]#
Check if this is a python object type.
Example
>>> import daft >>> dtype = daft.DataType.python() >>> assert dtype.is_python()
- is_sparse_tensor() builtins.bool [source]#
Check if this is a sparse tensor type.
Example
>>> import daft >>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32()) >>> assert dtype.is_sparse_tensor()
- is_string() builtins.bool [source]#
Check if this is a string type.
Example
>>> import daft >>> dtype = daft.DataType.string() >>> assert dtype.is_string()
- is_struct() builtins.bool [source]#
Check if this is a struct type.
Example
>>> import daft >>> dtype = daft.DataType.struct({"a": daft.DataType.int64()}) >>> assert dtype.is_struct()
- is_temporal() builtins.bool [source]#
Check if this is a temporal type.
Example
>>> import daft >>> dtype = daft.DataType.timestamp(timeunit="ns") >>> assert dtype.is_temporal()
- is_tensor() builtins.bool [source]#
Check if this is a tensor type.
Example
>>> import daft >>> dtype = daft.DataType.tensor(daft.DataType.float32()) >>> assert dtype.is_tensor()
- is_time() builtins.bool [source]#
Check if this is a time type.
Example
>>> import daft >>> dtype = daft.DataType.time(timeunit="ns") >>> assert dtype.is_time()
- is_timestamp() builtins.bool [source]#
Check if this is a timestamp type.
Example
>>> import daft >>> dtype = daft.DataType.timestamp(timeunit="ns") >>> assert dtype.is_timestamp()
- is_uint16() builtins.bool [source]#
Check if this is an unsigned 16-bit integer type.
Example
>>> import daft >>> dtype = daft.DataType.uint16() >>> assert dtype.is_uint16()
- is_uint32() builtins.bool [source]#
Check if this is an unsigned 32-bit integer type.
Example
>>> import daft >>> dtype = daft.DataType.uint32() >>> assert dtype.is_uint32()
- is_uint64() builtins.bool [source]#
Check if this is an unsigned 64-bit integer type.
Example
>>> import daft >>> dtype = daft.DataType.uint64() >>> assert dtype.is_uint64()
- is_uint8() builtins.bool [source]#
Check if this is an unsigned 8-bit integer type.
Example
>>> import daft >>> dtype = daft.DataType.uint8() >>> assert dtype.is_uint8()
- property key_type: DataType#
If this is a map type, return the key type, otherwise an attribute error is raised.
Example
>>> import daft >>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64()) >>> assert dtype.key_type == daft.DataType.string() >>> dtype = daft.DataType.int64() >>> try: ... dtype.key_type ... except AttributeError: ... pass
- classmethod list(dtype: DataType) DataType [source]#
Create a List DataType: Variable-length list, where each element in the list has type
dtype
.- Parameters:
dtype – DataType of each element in the list
- classmethod map(key_type: DataType, value_type: DataType) DataType [source]#
Create a Map DataType: A map is a nested type of key-value pairs that is implemented as a list of structs with two fields, key and value.
- Parameters:
key_type – DataType of the keys in the map
value_type – DataType of the values in the map
- property precision: int#
If this is a decimal type, return the precision, otherwise an attribute error is raised.
Example
>>> import daft >>> dtype = daft.DataType.decimal128(precision=10, scale=2) >>> assert dtype.precision == 10 >>> dtype = daft.DataType.int64() >>> try: ... dtype.precision ... except AttributeError: ... pass
- classmethod python() DataType [source]#
Create a Python DataType: a type which refers to an arbitrary Python object.
- property scale: int#
If this is a decimal type, return the scale, otherwise an attribute error is raised.
Example
>>> import daft >>> dtype = daft.DataType.decimal128(precision=10, scale=2) >>> assert dtype.scale == 2 >>> dtype = daft.DataType.int64() >>> try: ... dtype.precision ... except AttributeError: ... pass
- property shape: tuple[int, ...]#
If this is a fixed shape type, return the shape, otherwise an attribute error is raised.
Example
>>> import daft >>> dtype = daft.DataType.tensor(daft.DataType.float32(), shape=(2, 3)) >>> assert dtype.shape == (2, 3) >>> dtype = daft.DataType.tensor(daft.DataType.float32()) >>> try: ... dtype.shape ... except AttributeError: ... pass
- property size: int#
If this is a fixed size type, return the size, otherwise an attribute error is raised.
Example
>>> import daft >>> dtype = daft.DataType.fixed_size_binary(size=10) >>> assert dtype.size == 10 >>> dtype = daft.DataType.binary() >>> try: ... dtype.size ... except AttributeError: ... pass
- classmethod sparse_tensor(dtype: DataType, shape: tuple[int, ...] | None = None, use_offset_indices: builtins.bool = False) DataType [source]#
Create a SparseTensor DataType: SparseTensor arrays implemented as ‘COO Sparse Tensor’ representation of n-dimensional arrays of data of the provided
dtype
as elements, each of the providedshape
.If a
shape
is given, each ndarray in the column will have this shape.If
shape
is not given, the ndarrays in the column can have different shapes. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.The
use_offset_indices
parameter determines how the indices of the SparseTensor are stored: -False
(default): Indices represent the actual positions of nonzero values. -True
: Indices represent the offsets between consecutive nonzero values. This can improve compression efficiency, especially when nonzero values are clustered together, as offsets between them are often zero, making them easier to compress.- Parameters:
dtype – The type of the data contained within the tensor elements.
shape – The shape of each SparseTensor in the column. This is
None
by default, which allows the shapes of each tensor element to vary.use_offset_indices – Determines how indices are represented. Defaults to
False
(storing actual indices). IfTrue
, stores offsets between nonzero indices.
- classmethod struct(fields: dict[str, daft.datatype.DataType]) DataType [source]#
Create a Struct DataType: a nested type which has names mapped to child types.
Example
>>> struct_type = DataType.struct({"name": DataType.string(), "age": DataType.int64()})
- Parameters:
fields – Nested fields of the Struct
- classmethod tensor(dtype: DataType, shape: Optional[tuple[int, ...]] = None) DataType [source]#
Create a tensor DataType: tensor arrays contain n-dimensional arrays of data of the provided
dtype
as elements, each of the providedshape
.If a
shape
is given, each ndarray in the column will have this shape.If
shape
is not given, the ndarrays in the column can have different shapes. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.- Parameters:
dtype – The type of the data contained within the tensor elements.
shape – The shape of each tensor in the column. This is
None
by default, which allows the shapes of each tensor element to vary.
- classmethod time(timeunit: daft.datatype.TimeUnit | str) DataType [source]#
Time DataType. Supported timeunits are “us”, “ns”.
- classmethod timestamp(timeunit: daft.datatype.TimeUnit | str, timezone: Optional[str] = None) DataType [source]#
Timestamp DataType.
- property timeunit: TimeUnit#
If this is a time or timestamp type, return the timeunit, otherwise an attribute error is raised.
Example
>>> import daft >>> dtype = daft.DataType.time(timeunit="ns") >>> dtype.timeunit TimeUnit(ns) >>> dtype = daft.DataType.int64() >>> try: ... dtype.timeunit ... except AttributeError: ... pass
- property timezone: str | None#
If this is a timestamp type, return the timezone, otherwise an attribute error is raised.
Example
>>> import daft >>> dtype = daft.DataType.timestamp(timeunit="ns", timezone="UTC") >>> assert dtype.timezone == "UTC" >>> dtype = daft.DataType.int64() >>> try: ... dtype.time_zone ... except AttributeError: ... pass
- property use_offset_indices: builtins.bool#
If this is a sparse tensor type, return whether the indices are stored as offsets, otherwise an attribute error is raised.
Example
>>> import daft >>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32(), use_offset_indices=True) >>> assert dtype.use_offset_indices >>> dtype = daft.DataType.int64() >>> try: ... dtype.use_offset_indices ... except AttributeError: ... pass
- property value_type: DataType#
If this is a map type, return the value type, otherwise an attribute error is raised.
Example
>>> import daft >>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64()) >>> assert dtype.value_type == daft.DataType.int64() >>> dtype = daft.DataType.int64() >>> try: ... dtype.value_type ... except AttributeError: ... pass