Series#
- class daft.Series[source]#
A Daft Series is an array of data of a single type, and is usually a column in a DataFrame.
- static from_arrow(array: pyarrow.lib.Array | pyarrow.lib.ChunkedArray, name: str = 'arrow_series') Series [source]#
Construct a Series from an pyarrow array or chunked array.
- Parameters:
array – The pyarrow (chunked) array whose data we wish to put in the Series.
name – The name associated with the Series; this is usually the column name.
- classmethod from_numpy(data: ndarray, name: str = 'numpy_series') Series [source]#
Construct a Series from a NumPy ndarray.
If the provided NumPy ndarray is 1-dimensional, Daft will attempt to store the ndarray in a pyarrow Array. If the ndarray has more than 1 dimension OR storing the 1D array in Arrow failed, Daft will store the ndarray data as a Python list of NumPy ndarrays.
- Parameters:
data – The NumPy ndarray whose data we wish to put in the Series.
name – The name associated with the Series; this is usually the column name.
- classmethod from_pandas(data: Series, name: str = 'pd_series') Series [source]#
Construct a Series from a pandas Series.
This will first try to convert the series into a pyarrow array, then will fall back to converting the series to a NumPy ndarray and going through that construction path, and will finally fall back to converting the series to a Python list and going through that path.
- Parameters:
data – The pandas Series whose data we wish to put in the Daft Series.
name – The name associated with the Series; this is usually the column name.
- static from_pylist(data: list, name: str = 'list_series', pyobj: str = 'allow') Series [source]#
Construct a Series from a Python list.
- The resulting type depends on the setting of pyobjects:
"allow"
: Arrow-backed types if possible, else PyObject;"disallow"
: Arrow-backed types only, raising error if not convertible;"force"
: Store as PyObject types.
- Parameters:
data – The Python list whose data we wish to put in the Series.
name – The name associated with the Series; this is usually the column name.
pyobj – Whether we want to
"allow"
coercion to Arrow types,"disallow"
falling back to Python type representation, or"force"
the data to only have a Python type representation. Default is"allow"
.
- size_bytes() int [source]#
Returns the total sizes of all buffers used for representing this Series.
In particular, this includes the:
Buffer(s) used for data (applies any slicing if that occurs!)
Buffer(s) used for offsets, if applicable (for variable-length arrow types)
Buffer(s) used for validity, if applicable (arrow can choose to omit the validity bitmask)
Recursively gets .size_bytes for any child arrays, if applicable (for nested types)