Series#
- class Series[source]#
A Daft Series is an array of data of a single type, and is usually a column in a DataFrame.
- arctan2(other: Series) Series [source]#
Calculates the four quadrant arctangent of coordinates (y, x).
- static from_arrow(array: pyarrow.lib.Array | pyarrow.lib.ChunkedArray, name: str = 'arrow_series') Series [source]#
Construct a Series from an pyarrow array or chunked array.
- Parameters:
array – The pyarrow (chunked) array whose data we wish to put in the Series.
name – The name associated with the Series; this is usually the column name.
- classmethod from_numpy(data: ndarray, name: str = 'numpy_series') Series [source]#
Construct a Series from a NumPy ndarray.
If the provided NumPy ndarray is 1-dimensional, Daft will attempt to store the ndarray in a pyarrow Array. If the ndarray has more than 1 dimension OR storing the 1D array in Arrow failed, Daft will store the ndarray data as a Python list of NumPy ndarrays.
- Parameters:
data – The NumPy ndarray whose data we wish to put in the Series.
name – The name associated with the Series; this is usually the column name.
- classmethod from_pandas(data: Series, name: str = 'pd_series') Series [source]#
Construct a Series from a pandas Series.
This will first try to convert the series into a pyarrow array, then will fall back to converting the series to a NumPy ndarray and going through that construction path, and will finally fall back to converting the series to a Python list and going through that path.
- Parameters:
data – The pandas Series whose data we wish to put in the Daft Series.
name – The name associated with the Series; this is usually the column name.
- static from_pylist(data: list, name: str = 'list_series', pyobj: str = 'allow') Series [source]#
Construct a Series from a Python list.
- The resulting type depends on the setting of pyobjects:
"allow"
: Arrow-backed types if possible, else PyObject;"disallow"
: Arrow-backed types only, raising error if not convertible;"force"
: Store as PyObject types.
- Parameters:
data – The Python list whose data we wish to put in the Series.
name – The name associated with the Series; this is usually the column name.
pyobj – Whether we want to
"allow"
coercion to Arrow types,"disallow"
falling back to Python type representation, or"force"
the data to only have a Python type representation. Default is"allow"
.
- log(base: float) Series [source]#
The elementwise log with given base, of a numeric series.
- Parameters:
base – The base of the logarithm.
- minhash(num_hashes: int, ngram_size: int, seed: int = 1, hash_function: Literal['murmurhash3', 'xxhash', 'sha1'] = 'murmurhash3') Series [source]#
Runs the MinHash algorithm on the series.
For a string, calculates the minimum hash over all its ngrams, repeating with
num_hashes
permutations. Returns as a list of 32-bit unsigned integers.Tokens for the ngrams are delimited by spaces. MurmurHash is used for the initial hash.
- Parameters:
num_hashes – The number of hash permutations to compute.
ngram_size – The number of tokens in each shingle/ngram.
seed (optional) – Seed used for generating permutations and the initial string hashes. Defaults to 1.
hash_function (optional) – Hash function to use for initial string hashing. One of “murmur3”, “xxhash”, or “sha1”. Defaults to “murmur3”.
- size_bytes() int [source]#
Returns the total sizes of all buffers used for representing this Series.
In particular, this includes the:
Buffer(s) used for data (applies any slicing if that occurs!)
Buffer(s) used for offsets, if applicable (for variable-length arrow types)
Buffer(s) used for validity, if applicable (arrow can choose to omit the validity bitmask)
Recursively gets .size_bytes for any child arrays, if applicable (for nested types)