Expressions#
Daft Expressions allow you to express some computation that needs to happen in a DataFrame.
This page provides an overview of all the functionality that is provided by Daft Expressions.
Constructors#
Creates an Expression referring to the column with the provided name |
|
Creates an Expression representing a column with every value set to the provided value |
Generic#
Gives the expression a new name, which is its column's name in the DataFrame schema and the name by which subsequent expressions can refer to the results of this expression. |
|
Casts an expression to the given datatype if possible |
|
Conditionally choose values between two expressions using the current boolean expression as a condition |
|
Checks if values in the Expression are Null (a special value indicating missing data) |
|
Checks if values in the Expression are not Null (a special value indicating missing data) |
|
Apply a function on each value in a given expression |
Numeric#
Absolute of a numeric expression ( |
|
Adds two numeric expressions or concatenates two string expressions ( |
|
Subtracts two numeric expressions ( |
|
Multiplies two numeric expressions ( |
|
True divides two numeric expressions ( |
|
Takes the mod of two numeric expressions ( |
|
The ceiling of a numeric expression ( |
|
The floor of a numeric expression ( |
|
The sign of a numeric expression ( |
|
The round of a numeric expression ( |
Logical#
Inverts a boolean expression ( |
|
Takes the logical AND of two boolean expressions ( |
|
Takes the logical OR of two boolean expressions ( |
|
Compares if an expression is less than another ( |
|
Compares if an expression is less than or equal to another ( |
|
Compares if an expression is equal to another ( |
|
Compares if an expression is not equal to another ( |
|
Compares if an expression is greater than another ( |
|
Compares if an expression is greater than or equal to another ( |
|
Checks if values in the Expression are in the provided list |
Aggregation#
The following can be used with DataFrame.agg or GroupedDataFrame.agg
|
Counts the number of values in the expression. |
Calculates the sum of the values in the expression |
|
Calculates the mean of the values in the expression |
|
Calculates the minimum value in the expression |
|
Calculates the maximum value in the expression |
|
|
Returns any value in the expression |
Aggregates the values in the expression into a list |
|
Aggregates the values in the expression into a single string by concatenating them |
Strings#
The following methods are available under the expr.str
attribute.
Checks whether each string contains the given pattern in a string column |
|
Checks whether each string matches the given regular expression pattern in a string column |
|
Checks whether each string starts with the given pattern in a string column |
|
Checks whether each string ends with the given pattern in a string column |
|
Concatenates two string expressions together |
|
Splits each string on the given pattern, into one or more strings. |
|
Extracts the specified match group from the first regex match in each string in a string column. |
|
Extracts the specified match group from all regex matches in each string in a string column. |
|
Retrieves the length for a UTF-8 string column |
|
Convert UTF-8 string to all lowercase |
|
Convert UTF-8 string to all upper |
|
Strip whitespace from the left side of a UTF-8 string |
|
Strip whitespace from the right side of a UTF-8 string |
|
Reverse a UTF-8 string |
|
Capitalize a UTF-8 string |
|
Gets the n (from nchars) left-most characters of each string |
|
Gets the n (from nchars) right-most characters of each string |
Temporal#
Retrieves the date for a datetime column |
|
Retrieves the day for a datetime column |
|
Retrieves the day for a datetime column |
|
Retrieves the month for a datetime column |
|
Retrieves the year for a datetime column |
|
Retrieves the day of the week for a datetime column, starting at 0 for Monday and ending at 6 for Sunday |
List#
Joins every element of a list using the specified string delimiter |
|
Gets the length of each list |
|
Gets the element at an index in each list |
Struct#
Retrieves one field from a struct column |
Image#
Decodes the binary data in this column into images. |
|
Encode an image column as the provided image file format, returning a binary column of encoded bytes. |
|
Resize image into the provided width and height. |
|
Crops images with the provided bounding box |
Partitioning#
Partitioning Transform that returns the number of days since epoch (1970-01-01) |
|
Partitioning Transform that returns the number of hours since epoch (1970-01-01) |
|
Partitioning Transform that returns the number of months since epoch (1970-01-01) |
|
Partitioning Transform that returns the number of years since epoch (1970-01-01) |
|
Partitioning Transform that returns the Hash Bucket following the Iceberg Specification of murmur3_32_x86 https://iceberg.apache.org/spec/#appendix-b-32-bit-hash-requirements |
|
Partitioning Transform that truncates the input to a standard width |
URLs#
Treats each string as a URL, and downloads the bytes contents as a bytes column |
JSON#
Query JSON data in a column using a JQ-style filter https://jqlang.github.io/jq/manual/ This expression uses jaq as the underlying executor, see 01mf02/jaq for the full list of supported filters. |