Expressions#

Daft Expressions allow you to express some computation that needs to happen in a DataFrame.

This page provides an overview of all the functionality that is provided by Daft Expressions.

Constructors#

col

Creates an Expression referring to the column with the provided name

lit

Creates an Expression representing a column with every value set to the provided value

Generic#

Expression.alias

Gives the expression a new name, which is its column's name in the DataFrame schema and the name by which subsequent expressions can refer to the results of this expression.

Expression.cast

Casts an expression to the given datatype if possible

Expression.if_else

Conditionally choose values between two expressions using the current boolean expression as a condition

Expression.is_null

Checks if values in the Expression are Null (a special value indicating missing data)

Expression.not_null

Checks if values in the Expression are not Null (a special value indicating missing data)

Expression.apply

Apply a function on each value in a given expression

Numeric#

Expression.__abs__

Absolute of a numeric expression (abs(expr))

Expression.__add__

Adds two numeric expressions or concatenates two string expressions (e1 + e2)

Expression.__sub__

Subtracts two numeric expressions (e1 - e2)

Expression.__mul__

Multiplies two numeric expressions (e1 * e2)

Expression.__truediv__

True divides two numeric expressions (e1 / e2)

Expression.__mod__

Takes the mod of two numeric expressions (e1 % e2)

Expression.ceil

The ceiling of a numeric expression (expr.ceil())

Expression.floor

The floor of a numeric expression (expr.floor())

Expression.sign

The sign of a numeric expression (expr.sign())

Expression.round

The round of a numeric expression (expr.round(decimals = 0))

Logical#

Expression.__invert__

Inverts a boolean expression (~e)

Expression.__and__

Takes the logical AND of two boolean expressions (e1 & e2)

Expression.__or__

Takes the logical OR of two boolean expressions (e1 | e2)

Expression.__lt__

Compares if an expression is less than another (e1 < e2)

Expression.__le__

Compares if an expression is less than or equal to another (e1 <= e2)

Expression.__eq__

Compares if an expression is equal to another (e1 == e2)

Expression.__ne__

Compares if an expression is not equal to another (e1 != e2)

Expression.__gt__

Compares if an expression is greater than another (e1 > e2)

Expression.__ge__

Compares if an expression is greater than or equal to another (e1 >= e2)

Expression.is_in

Checks if values in the Expression are in the provided list

Aggregation#

The following can be used with DataFrame.agg or GroupedDataFrame.agg

Expression.count([mode])

Counts the number of values in the expression.

Expression.sum()

Calculates the sum of the values in the expression

Expression.mean()

Calculates the mean of the values in the expression

Expression.min()

Calculates the minimum value in the expression

Expression.max()

Calculates the maximum value in the expression

Expression.any_value([ignore_nulls])

Returns any value in the expression

Expression.agg_list()

Aggregates the values in the expression into a list

Expression.agg_concat()

Aggregates the values in the expression into a single string by concatenating them

Strings#

The following methods are available under the expr.str attribute.

Expression.str.contains

Checks whether each string contains the given pattern in a string column

Expression.str.match

Checks whether each string matches the given regular expression pattern in a string column

Expression.str.startswith

Checks whether each string starts with the given pattern in a string column

Expression.str.endswith

Checks whether each string ends with the given pattern in a string column

Expression.str.concat

Concatenates two string expressions together

Expression.str.split

Splits each string on the given pattern, into one or more strings.

Expression.str.extract

Extracts the specified match group from the first regex match in each string in a string column.

Expression.str.extract_all

Extracts the specified match group from all regex matches in each string in a string column.

Expression.str.length

Retrieves the length for a UTF-8 string column

Expression.str.lower

Convert UTF-8 string to all lowercase

Expression.str.upper

Convert UTF-8 string to all upper

Expression.str.lstrip

Strip whitespace from the left side of a UTF-8 string

Expression.str.rstrip

Strip whitespace from the right side of a UTF-8 string

Expression.str.reverse

Reverse a UTF-8 string

Expression.str.capitalize

Capitalize a UTF-8 string

Expression.str.left

Gets the n (from nchars) left-most characters of each string

Expression.str.right

Gets the n (from nchars) right-most characters of each string

Temporal#

Expression.dt.date

Retrieves the date for a datetime column

Expression.dt.hour

Retrieves the day for a datetime column

Expression.dt.day

Retrieves the day for a datetime column

Expression.dt.month

Retrieves the month for a datetime column

Expression.dt.year

Retrieves the year for a datetime column

Expression.dt.day_of_week

Retrieves the day of the week for a datetime column, starting at 0 for Monday and ending at 6 for Sunday

List#

Expression.list.join

Joins every element of a list using the specified string delimiter

Expression.list.lengths

Gets the length of each list

Expression.list.get

Gets the element at an index in each list

Struct#

Expression.struct.get

Retrieves one field from a struct column

Image#

Expression.image.decode

Decodes the binary data in this column into images.

Expression.image.encode

Encode an image column as the provided image file format, returning a binary column of encoded bytes.

Expression.image.resize

Resize image into the provided width and height.

Expression.image.crop

Crops images with the provided bounding box

Partitioning#

Expression.partitioning.days

Partitioning Transform that returns the number of days since epoch (1970-01-01)

Expression.partitioning.hours

Partitioning Transform that returns the number of hours since epoch (1970-01-01)

Expression.partitioning.months

Partitioning Transform that returns the number of months since epoch (1970-01-01)

Expression.partitioning.years

Partitioning Transform that returns the number of years since epoch (1970-01-01)

Expression.partitioning.iceberg_bucket

Partitioning Transform that returns the Hash Bucket following the Iceberg Specification of murmur3_32_x86 https://iceberg.apache.org/spec/#appendix-b-32-bit-hash-requirements

Expression.partitioning.iceberg_truncate

Partitioning Transform that truncates the input to a standard width w following the Iceberg Specification.

URLs#

Expression.url.download

Treats each string as a URL, and downloads the bytes contents as a bytes column

JSON#

Expression.json.query

Query JSON data in a column using a JQ-style filter https://jqlang.github.io/jq/manual/ This expression uses jaq as the underlying executor, see 01mf02/jaq for the full list of supported filters.