Series#

class daft.Series[source]#

A Daft Series is an array of data of a single type, and is usually a column in a DataFrame.

static from_arrow(array: pyarrow.lib.Array | pyarrow.lib.ChunkedArray, name: str = 'arrow_series') Series[source]#

Construct a Series from an pyarrow array or chunked array.

Parameters:
  • array – The pyarrow (chunked) array whose data we wish to put in the Series.

  • name – The name associated with the Series; this is usually the column name.

classmethod from_numpy(data: ndarray, name: str = 'numpy_series') Series[source]#

Construct a Series from a NumPy ndarray.

If the provided NumPy ndarray is 1-dimensional, Daft will attempt to store the ndarray in a pyarrow Array. If the ndarray has more than 1 dimension OR storing the 1D array in Arrow failed, Daft will store the ndarray data as a Python list of NumPy ndarrays.

Parameters:
  • data – The NumPy ndarray whose data we wish to put in the Series.

  • name – The name associated with the Series; this is usually the column name.

classmethod from_pandas(data: Series, name: str = 'pd_series') Series[source]#

Construct a Series from a pandas Series.

This will first try to convert the series into a pyarrow array, then will fall back to converting the series to a NumPy ndarray and going through that construction path, and will finally fall back to converting the series to a Python list and going through that path.

Parameters:
  • data – The pandas Series whose data we wish to put in the Daft Series.

  • name – The name associated with the Series; this is usually the column name.

static from_pylist(data: list, name: str = 'list_series', pyobj: str = 'allow') Series[source]#

Construct a Series from a Python list.

The resulting type depends on the setting of pyobjects:
  • "allow": Arrow-backed types if possible, else PyObject;

  • "disallow": Arrow-backed types only, raising error if not convertible;

  • "force": Store as PyObject types.

Parameters:
  • data – The Python list whose data we wish to put in the Series.

  • name – The name associated with the Series; this is usually the column name.

  • pyobj – Whether we want to "allow" coercion to Arrow types, "disallow" falling back to Python type representation, or "force" the data to only have a Python type representation. Default is "allow".

size_bytes() int[source]#

Returns the total sizes of all buffers used for representing this Series.

In particular, this includes the:

  1. Buffer(s) used for data (applies any slicing if that occurs!)

  2. Buffer(s) used for offsets, if applicable (for variable-length arrow types)

  3. Buffer(s) used for validity, if applicable (arrow can choose to omit the validity bitmask)

  4. Recursively gets .size_bytes for any child arrays, if applicable (for nested types)

to_arrow(cast_tensors_to_ray_tensor_dtype: bool = False) Array[source]#

Convert this Series to an pyarrow array.

to_pylist() list[source]#

Convert this Series to a Python list.