User Defined Functions (UDFs)

Contents

User Defined Functions (UDFs)#

daft.udf(*, return_dtype: DataType) Callable[[Callable[[...], Union[Series, ndarray, list]]], UDF][source]#

Decorator to convert a Python function into a UDF

UDFs allow users to run arbitrary Python code on the outputs of Expressions.

Note

In most cases, UDFs will be slower than a native kernel/expression because of the required Rust and Python overheads. If your computation can be expressed using Daft expressions, you should do so instead of writing a UDF. If your UDF expresses a common use-case that isn’t already covered by Daft, you should file a ticket or contribute this functionality back to Daft as a kernel!

In the example below, we create a UDF that:

  1. Receives data under the argument name x

  2. Converts the x Daft Series into a Python list using x.to_pylist()

  3. Adds a Python constant value c to every element in x

  4. Returns a new list of Python values which will be coerced to the specified return type: return_dtype=DataType.int64().

  5. We can call our UDF on a dataframe using any of the dataframe projection operations (df.with_column(), df.select(), etc.)

Example

>>> @udf(return_dtype=DataType.int64())
>>> def add_constant(x: Series, c=10):
>>>     return [v + c for v in x.to_pylist()]
>>>
>>> df = df.with_column("new_x", add_constant(df["x"], c=20))
Parameters:

return_dtype (DataType) – Returned type of the UDF

Returns:

UDF decorator - converts a user-provided Python function as a UDF that can be called on Expressions

Return type:

Callable[[UserProvidedPythonFunction], UDF]