SQL#
- sql(sql: str, catalog: Optional[SQLCatalog] = None, register_globals: bool = True) DataFrame [source]#
Run a SQL query, returning the results as a DataFrame
Warning
This features is early in development and will likely experience API changes.
Examples
A simple example joining 2 dataframes together using a SQL statement, relying on Daft to detect the names of SQL tables using their corresponding Python variable names.
>>> import daft >>> >>> df1 = daft.from_pydict({"a": [1, 2, 3], "b": ["foo", "bar", "baz"]}) >>> df2 = daft.from_pydict({"a": [1, 2, 3], "c": ["daft", None, None]}) >>> >>> # Daft automatically detects `df1` and `df2` from your Python global namespace >>> result_df = daft.sql("SELECT * FROM df1 JOIN df2 ON df1.a = df2.a") >>> result_df.show() ╭───────┬──────┬──────╮ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ Int64 ┆ Utf8 ┆ Utf8 │ ╞═══════╪══════╪══════╡ │ 1 ┆ foo ┆ daft │ ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤ │ 2 ┆ bar ┆ None │ ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤ │ 3 ┆ baz ┆ None │ ╰───────┴──────┴──────╯ (Showing first 3 of 3 rows)
A more complex example using a SQLCatalog to create a named table called
"my_table"
, which can then be referenced from inside your SQL statement.>>> import daft >>> from daft.sql import SQLCatalog >>> >>> df = daft.from_pydict({"a": [1, 2, 3], "b": ["foo", "bar", "baz"]}) >>> >>> # Register dataframes as tables in SQL explicitly with names >>> catalog = SQLCatalog({"my_table": df}) >>> >>> daft.sql("SELECT a FROM my_table", catalog=catalog).show() ╭───────╮ │ a │ │ --- │ │ Int64 │ ╞═══════╡ │ 1 │ ├╌╌╌╌╌╌╌┤ │ 2 │ ├╌╌╌╌╌╌╌┤ │ 3 │ ╰───────╯ (Showing first 3 of 3 rows)
- Parameters:
sql (str) – SQL query to execute
catalog (SQLCatalog, optional) – Catalog of tables to use in the query. Defaults to None, in which case a catalog will be built from variables in the callers scope.
register_globals (bool, optional) – Whether to incorporate global variables into the supplied catalog, in which case a copy of the catalog will be made and the original not modified. Defaults to True.
- Returns:
Dataframe containing the results of the query
- Return type:
- sql_expr(sql: str) Expression [source]#
Parses a SQL string into a Daft Expression
This function allows you to create Daft Expressions from SQL snippets, which can then be used in Daft operations or combined with other Daft Expressions.
- Parameters:
sql (str) – A SQL string to be parsed into a Daft Expression.
- Returns:
A Daft Expression representing the parsed SQL.
- Return type:
Expression
Examples
Create a simple SQL expression:
>>> import daft >>> expr = daft.sql_expr("1 + 2") >>> print(expr) lit(1) + lit(2)
Use SQL expression in a Daft DataFrame operation:
>>> df = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]}) >>> df = df.with_column("c", daft.sql_expr("a + b")) >>> df.show() ╭───────┬───────┬───────╮ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ Int64 ┆ Int64 ┆ Int64 │ ╞═══════╪═══════╪═══════╡ │ 1 ┆ 4 ┆ 5 │ ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤ │ 2 ┆ 5 ┆ 7 │ ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤ │ 3 ┆ 6 ┆ 9 │ ╰───────┴───────┴───────╯ (Showing first 3 of 3 rows)
daft.sql_expr
is also called automatically for you in some DataFrame operations such as filters:>>> df = daft.from_pydict({"x": [1, 2, 3], "y": [4, 5, 6]}) >>> result = df.where("x < 3 AND y > 4") >>> result.show() ╭───────┬───────╮ │ x ┆ y │ │ --- ┆ --- │ │ Int64 ┆ Int64 │ ╞═══════╪═══════╡ │ 2 ┆ 5 │ ╰───────┴───────╯ (Showing first 1 of 1 rows)
SQL Functions#
This is a full list of functions that can be used from within SQL.
|
Create configurations to be used when accessing Azure Blob Storage. |
|
Create configurations to be used when accessing Google Cloud Storage. |
|
Create configurations for sending web requests. |
|
Create configurations to be used when accessing an S3-compatible system. |
|
Gets the absolute value of a number. |
|
Calculates the inverse cosine (arc cosine) of a number. |
|
Calculates the inverse hyperbolic cosine of a number. |
|
Joins elements of a list into a single string using a specified separator. |
|
Calculates the inverse sine (arc sine) of a number. |
|
Calculates the inverse hyperbolic sine of a number. |
|
Calculates the inverse tangent (arc tangent) of a number. |
|
Calculates the angle between the positive x-axis and the ray from (0,0) to (x,y). |
|
Calculates the inverse hyperbolic tangent of a number. |
|
Calculates the average (mean) of non-null elements in the input expression. |
|
Capitalizes the first character of the string |
|
Rounds a number up to the nearest integer. |
|
Clips a number to a specified range. |
|
Concatenate the inputs into a single string |
|
Returns true if the string contains the specified substring |
|
Calculates the cosine of an angle in radians. |
|
Calculates the cotangent of an angle in radians. |
|
Counts the number of non-null elements in the input expression. |
|
Counts the number of times a pattern, or multiple patterns, appears in the input. |
|
Extracts the date component from a datetime expression. |
|
Extracts the day component from a datetime expression. |
|
Extracts the day_of_week component from a datetime expression. |
|
Converts an angle from radians to degrees. |
|
Returns true if the string ends with the specified substring |
|
Calculates the exponential of a number (e^x). |
|
Expands a list column into multiple rows. |
|
Replaces NaN values in the input expression with a specified fill value. |
|
Returns the index of the first occurrence of a substring within the string |
|
Rounds a number down to the nearest integer. |
|
Hashes the values in the input expression. |
|
Extracts the hour component from a datetime expression. |
|
Crops an image to a specified bounding box. |
|
Decodes an image from binary data. |
|
Encodes an image into the specified image file format, returning a binary column of encoded bytes. |
|
Resizes an image to the specified width and height. |
|
Converts an image to the specified mode (e.g. |
|
Checks if the input expression is infinite (positive or negative infinity). |
|
Checks if the input expression is NaN (Not a Number). |
|
Extracts a JSON object from a JSON string using a JSONPath expression. |
|
Returns the specified number of leftmost characters from the string |
|
Returns the length of the string |
|
Returns the length of the string in bytes |
|
Splits a list into chunks of a specified size. |
|
Counts the number of elements in a list. |
|
Joins elements of a list into a single string using a specified separator. |
|
Returns the maximum value in a list. |
|
Calculates the mean (average) of values in a list. |
|
Returns the minimum value in a list. |
|
Extracts a portion of a list from a start index to an end index. |
|
Sorts the elements of a list in ascending or descending order. |
|
Calculates the sum of values in a list. |
|
Calculates the natural logarithm of a number. |
|
Calculates the natural logarithm of a number. |
|
Calculates the base-10 logarithm of a number. |
|
Calculates the base-2 logarithm of a number. |
|
Converts the string to lowercase |
|
Pads the string on the left side with the specified string until it reaches the specified length |
|
Removes leading whitespace from the string |
|
Retrieves the value associated with a given key from a map. |
|
Retrieves the value associated with a given key from a map. |
|
Finds the maximum value among non-null elements in the input expression. |
|
Calculates the average (mean) of non-null elements in the input expression. |
|
Finds the minimum value among non-null elements in the input expression. |
|
Calculates the minimum hash over the inputs ngrams, repeating with num_hashes permutations. |
|
Extracts the minute component from a datetime expression. |
|
Extracts the month component from a datetime expression. |
|
Normalizes a string for more useful deduplication and data cleaning. |
|
Checks if the input expression is not NaN (Not a Number). |
|
Extracts the number of days since epoch time from a datetime expression. |
|
Extracts the number of hours since epoch time from a datetime expression. |
|
Computes a bucket number for the input expression based the specified number of buckets using an Iceberg-specific hash. |
|
Truncates the input expression to a specified width. |
|
Extracts the number of months since epoch time from a datetime expression. |
|
Extracts the number of years since epoch time from a datetime expression. |
|
Converts an angle from degrees to radians. |
|
Extracts the first substring that matches the specified regular expression pattern |
|
Extracts all substrings that match the specified regular expression pattern |
|
Returns true if the string matches the specified regular expression pattern |
|
Replaces all occurrences of a substring with a new string |
|
Splits the string by the specified delimiter and returns an array of substrings |
|
Repeats the string the specified number of times |
|
Reverses the order of characters in the string |
|
Returns the specified number of rightmost characters from the string |
|
Rounds a number to a specified number of decimal places. |
|
Pads the string on the right side with the specified string until it reaches the specified length |
|
Removes trailing whitespace from the string |
|
Extracts the second component from a datetime expression. |
|
Returns the sign of a number (-1, 0, or 1). |
|
Calculates the sine of an angle in radians. |
|
Splits the string by the specified delimiter and returns an array of substrings |
|
Calculates the square root of a number. |
|
Returns true if the string starts with the specified substring |
|
Calculates the standard deviation of non-null elements in the input expression. |
|
Calculates the standard deviation of non-null elements in the input expression. |
|
Extracts a field from a struct expression by name. |
|
Extracts a field from a struct expression by name. |
|
Calculates the sum of non-null elements in the input expression. |
|
Calculates the tangent of an angle in radians. |
|
Extracts the time component from a datetime expression. |
|
Parses the string as a date using the specified format. |
|
Parses the string as a datetime using the specified format. |
|
Encodes each string as a list of integer tokens using a tokenizer. |
|
Decodes each list of integer tokens into a string using a tokenizer. |
|
Expands a list column into multiple rows. |
|
Converts the string to uppercase |
|
Extracts the year component from a datetime expression. |