Series#

class Series[source]#

A Daft Series is an array of data of a single type, and is usually a column in a DataFrame.

arccos() Series[source]#

The elementwise arc cosine of a numeric series.

arccosh() Series[source]#

The elementwise inverse hyperbolic cosine of a numeric series.

arcsin() Series[source]#

The elementwise arc sine of a numeric series.

arcsinh() Series[source]#

The elementwise inverse hyperbolic sine of a numeric series.

arctan() Series[source]#

The elementwise arc tangent of a numeric series.

arctan2(other: Series) Series[source]#

Calculates the four quadrant arctangent of coordinates (y, x).

arctanh() Series[source]#

The elementwise inverse hyperbolic tangent of a numeric series.

cos() Series[source]#

The elementwise cosine of a numeric series.

cot() Series[source]#

The elementwise cotangent of a numeric series.

degrees() Series[source]#

The elementwise degrees of a numeric series.

exp() Series[source]#

The e^self of a numeric series.

static from_arrow(array: pyarrow.lib.Array | pyarrow.lib.ChunkedArray, name: str = 'arrow_series') Series[source]#

Construct a Series from an pyarrow array or chunked array.

Parameters:
  • array – The pyarrow (chunked) array whose data we wish to put in the Series.

  • name – The name associated with the Series; this is usually the column name.

classmethod from_numpy(data: ndarray, name: str = 'numpy_series') Series[source]#

Construct a Series from a NumPy ndarray.

If the provided NumPy ndarray is 1-dimensional, Daft will attempt to store the ndarray in a pyarrow Array. If the ndarray has more than 1 dimension OR storing the 1D array in Arrow failed, Daft will store the ndarray data as a Python list of NumPy ndarrays.

Parameters:
  • data – The NumPy ndarray whose data we wish to put in the Series.

  • name – The name associated with the Series; this is usually the column name.

classmethod from_pandas(data: Series, name: str = 'pd_series') Series[source]#

Construct a Series from a pandas Series.

This will first try to convert the series into a pyarrow array, then will fall back to converting the series to a NumPy ndarray and going through that construction path, and will finally fall back to converting the series to a Python list and going through that path.

Parameters:
  • data – The pandas Series whose data we wish to put in the Daft Series.

  • name – The name associated with the Series; this is usually the column name.

static from_pylist(data: list, name: str = 'list_series', pyobj: str = 'allow') Series[source]#

Construct a Series from a Python list.

The resulting type depends on the setting of pyobjects:
  • "allow": Arrow-backed types if possible, else PyObject;

  • "disallow": Arrow-backed types only, raising error if not convertible;

  • "force": Store as PyObject types.

Parameters:
  • data – The Python list whose data we wish to put in the Series.

  • name – The name associated with the Series; this is usually the column name.

  • pyobj – Whether we want to "allow" coercion to Arrow types, "disallow" falling back to Python type representation, or "force" the data to only have a Python type representation. Default is "allow".

ln() Series[source]#

The elementwise ln of a numeric series.

log(base: float) Series[source]#

The elementwise log with given base, of a numeric series.

Parameters:

base – The base of the logarithm.

log10() Series[source]#

The elementwise log10 of a numeric series.

log2() Series[source]#

The elementwise log2 of a numeric series.

minhash(num_hashes: int, ngram_size: int, seed: int = 1, hash_function: Literal['murmurhash3', 'xxhash', 'sha1'] = 'murmurhash3') Series[source]#

Runs the MinHash algorithm on the series.

For a string, calculates the minimum hash over all its ngrams, repeating with num_hashes permutations. Returns as a list of 32-bit unsigned integers.

Tokens for the ngrams are delimited by spaces. MurmurHash is used for the initial hash.

Parameters:
  • num_hashes – The number of hash permutations to compute.

  • ngram_size – The number of tokens in each shingle/ngram.

  • seed (optional) – Seed used for generating permutations and the initial string hashes. Defaults to 1.

  • hash_function (optional) – Hash function to use for initial string hashing. One of “murmur3”, “xxhash”, or “sha1”. Defaults to “murmur3”.

radians() Series[source]#

The elementwise radians of a numeric series.

sin() Series[source]#

The elementwise sine of a numeric series.

size_bytes() int[source]#

Returns the total sizes of all buffers used for representing this Series.

In particular, this includes the:

  1. Buffer(s) used for data (applies any slicing if that occurs!)

  2. Buffer(s) used for offsets, if applicable (for variable-length arrow types)

  3. Buffer(s) used for validity, if applicable (arrow can choose to omit the validity bitmask)

  4. Recursively gets .size_bytes for any child arrays, if applicable (for nested types)

tan() Series[source]#

The elementwise tangent of a numeric series.

to_arrow() Array[source]#

Convert this Series to an pyarrow array.

to_pylist() list[source]#

Convert this Series to a Python list.