Expressions#
Daft Expressions allow you to express some computation that needs to happen in a DataFrame.
This page provides an overview of all the functionality that is provided by Daft Expressions.
Constructors#
Generic#
Converts multiple input expressions or column names into a struct. |
|
Gives the expression a new name, which is its column's name in the DataFrame schema and the name by which subsequent expressions can refer to the results of this expression. |
|
Casts an expression to the given datatype if possible. |
|
Conditionally choose values between two expressions using the current boolean expression as a condition |
|
Checks if values in the Expression are Null (a special value indicating missing data) |
|
Checks if values in the Expression are not Null (a special value indicating missing data) |
|
Fills null values in the Expression with the provided fill_value |
|
Hashes the values in the Expression. |
|
Apply a function on each value in a given expression |
Numeric#
Absolute of a numeric expression ( |
|
Adds two numeric expressions or concatenates two string expressions ( |
|
Subtracts two numeric expressions ( |
|
Multiplies two numeric expressions ( |
|
True divides two numeric expressions ( |
|
Takes the mod of two numeric expressions ( |
|
Shifts the bits of an integer expression to the left ( |
|
Shifts the bits of an integer expression to the right ( |
|
The ceiling of a numeric expression ( |
|
The floor of a numeric expression ( |
|
The sign of a numeric expression ( |
|
The round of a numeric expression ( |
|
Clips an expression to the given minimum and maximum values ( |
|
The square root of a numeric expression ( |
|
The cube root of a numeric expression ( |
|
The elementwise sine of a numeric expression ( |
|
The elementwise cosine of a numeric expression ( |
|
The elementwise tangent of a numeric expression ( |
|
The elementwise cotangent of a numeric expression ( |
|
The elementwise arc sine of a numeric expression ( |
|
The elementwise arc cosine of a numeric expression ( |
|
The elementwise arc tangent of a numeric expression ( |
|
Calculates the four quadrant arctangent of coordinates (y, x), in radians ( |
|
The elementwise inverse hyperbolic tangent of a numeric expression ( |
|
The elementwise inverse hyperbolic cosine of a numeric expression ( |
|
The elementwise inverse hyperbolic sine of a numeric expression ( |
|
The elementwise radians of a numeric expression ( |
|
The elementwise degrees of a numeric expression ( |
|
The elementwise log base 2 of a numeric expression ( |
|
The elementwise log base 10 of a numeric expression ( |
|
The elementwise log with given base, of a numeric expression ( |
|
The elementwise natural log of a numeric expression ( |
|
The e^self of a numeric expression ( |
|
Shifts the bits of an integer expression to the left ( |
|
Shifts the bits of an integer expression to the right ( |
Logical#
Inverts a boolean expression ( |
|
Takes the logical AND of two boolean expressions, or bitwise AND of two integer expressions ( |
|
Takes the logical OR of two boolean or integer expressions, or bitwise OR of two integer expressions ( |
|
Compares if an expression is less than another ( |
|
Compares if an expression is less than or equal to another ( |
|
Compares if an expression is equal to another ( |
|
Compares if an expression is not equal to another ( |
|
Compares if an expression is greater than another ( |
|
Compares if an expression is greater than or equal to another ( |
|
Checks if values in the Expression are between lower and upper, inclusive. |
|
Checks if values in the Expression are in the provided list |
|
Runs the MinHash algorithm on the series. |
Aggregation#
The following can be used with DataFrame.agg or GroupedDataFrame.agg
|
Counts the number of values in the expression. |
Calculates the sum of the values in the expression |
|
Calculates the mean of the values in the expression |
|
Calculates the standard deviation of the values in the expression |
|
Calculates the minimum value in the expression |
|
Calculates the maximum value in the expression |
|
|
Returns any value in the expression |
Aggregates the values in the expression into a list |
|
Aggregates the values in the expression into a single string by concatenating them |
|
|
Calculates the approximate percentile(s) for a column of numeric values |
Calculates the approximate number of non- |
Strings#
The following methods are available under the expr.str
attribute.
Checks whether each string contains the given pattern in a string column |
|
Checks whether each string matches the given regular expression pattern in a string column |
|
Checks whether each string starts with the given pattern in a string column |
|
Checks whether each string ends with the given pattern in a string column |
|
Concatenates two string expressions together |
|
Splits each string on the given literal or regex pattern, into a list of strings. |
|
Extracts the specified match group from the first regex match in each string in a string column. |
|
Extracts the specified match group from all regex matches in each string in a string column. |
|
Replaces all occurrences of a pattern in a string column with a replacement string. |
|
Retrieves the length for a UTF-8 string column |
|
Retrieves the length for a UTF-8 string column in bytes. |
|
Convert UTF-8 string to all lowercase |
|
Convert UTF-8 string to all upper |
|
Strip whitespace from the left side of a UTF-8 string |
|
Strip whitespace from the right side of a UTF-8 string |
|
Reverse a UTF-8 string |
|
Capitalize a UTF-8 string |
|
Gets the n (from nchars) left-most characters of each string |
|
Gets the n (from nchars) right-most characters of each string |
|
Returns the index of the first occurrence of the substring in each string |
|
Right-pads each string by truncating or padding with the character |
|
Left-pads each string by truncating on the right or padding with the character |
|
Repeats each string n times |
|
Checks whether each string matches the given SQL LIKE pattern, case sensitive |
|
Checks whether each string matches the given SQL LIKE pattern, case insensitive |
|
Extract a substring from a string, starting at a specified index and extending for a given length. |
|
Converts a string to a date using the specified format |
|
Converts a string to a datetime using the specified format and timezone |
|
Normalizes a string for more useful deduplication. |
|
Encodes each string as a list of integer tokens using a tokenizer. |
|
Decodes each list of integer tokens into a string using a tokenizer. |
|
Counts the number of times a pattern, or multiple patterns, appear in a string. |
Floats#
The following methods are available under the expr.float
attribute.
Checks if values in the Expression are Infinity. |
|
Checks if values are NaN (a special float value indicating not-a-number) |
|
Checks if values are not NaN (a special float value indicating not-a-number) |
|
Fills NaN values in the Expression with the provided fill_value |
Temporal#
Retrieves the date for a datetime column |
|
Retrieves the day for a datetime column |
|
Retrieves the minute for a datetime column |
|
Retrieves the second for a datetime column |
|
Retrieves the time for a datetime column |
|
Retrieves the day for a datetime column |
|
Retrieves the month for a datetime column |
|
Retrieves the year for a datetime column |
|
Retrieves the day of the week for a datetime column, starting at 0 for Monday and ending at 6 for Sunday |
|
Truncates the datetime column to the specified interval |
List#
Splits each list into chunks of the given size |
|
Counts the number of elements in each list |
|
Gets the element at an index in each list |
|
Joins every element of a list using the specified string delimiter |
|
Gets the length of each list |
|
Calculates the maximum of each list. |
|
Calculates the mean of each list. |
|
Calculates the minimum of each list. |
|
Gets a subset of each list |
|
Sorts the inner lists of a list column. |
|
Sums each list. |
|
Counts the occurrences of each unique value in the list. |
Struct#
Retrieves one field from a struct column |
Map#
Retrieves the value for a key in a map column |
Image#
Decodes the binary data in this column into images. |
|
Encode an image column as the provided image file format, returning a binary column of encoded bytes. |
|
Resize image into the provided width and height. |
|
Crops images with the provided bounding box |
|
Partitioning#
Partitioning Transform that returns the number of days since epoch (1970-01-01) |
|
Partitioning Transform that returns the number of hours since epoch (1970-01-01) |
|
Partitioning Transform that returns the number of months since epoch (1970-01-01) |
|
Partitioning Transform that returns the number of years since epoch (1970-01-01) |
|
Partitioning Transform that returns the Hash Bucket following the Iceberg Specification of murmur3_32_x86 https://iceberg.apache.org/spec/#appendix-b-32-bit-hash-requirements |
|
Partitioning Transform that truncates the input to a standard width |
URLs#
Treats each string as a URL, and downloads the bytes contents as a bytes column |
JSON#
Query JSON data in a column using a JQ-style filter https://jqlang.github.io/jq/manual/ This expression uses jaq as the underlying executor, see 01mf02/jaq for the full list of supported filters. |
Embedding#
Compute the cosine distance between two embeddings |