daft.read_sql

Contents

daft.read_sql#

daft.read_sql(sql: str, conn: Union[Callable[[], Connection], str], partition_col: Optional[str] = None, num_partitions: Optional[int] = None, disable_pushdowns_to_sql: bool = False) DataFrame[source]#

Creates a DataFrame from the results of a SQL query.

Parameters:
  • sql (str) – SQL query to execute

  • conn (Union[Callable[[], Connection], str]) – SQLAlchemy connection factory or database URL

  • partition_col (Optional[str]) – Column to partition the data by, defaults to None

  • num_partitions (Optional[int]) – Number of partitions to read the data into, defaults to None, which will lets Daft determine the number of partitions.

  • disable_pushdowns_to_sql (bool) – Whether to disable pushdowns to the SQL query, defaults to False

Returns:

Dataframe containing the results of the query

Return type:

DataFrame

Note

  1. Supported databases:

    Daft uses SQLGlot to build SQL queries, so it supports all databases that SQLGlot supports.

  2. Partitioning:

    When partition_col is specified, the function partitions the query based on that column. You can define num_partitions or leave it to Daft to decide. Daft calculates the specified column’s percentiles to determine partitions (e.g., for num_partitions=3, it uses the 33rd and 66th percentiles). If the database or column type lacks percentile calculation support, Daft partitions the query using equal ranges between the column’s minimum and maximum values.

  3. Execution:

    Daft executes SQL queries using using ConnectorX or SQLAlchemy, preferring ConnectorX unless a SQLAlchemy connection factory is specified or the database dialect is unsupported by ConnectorX.

  4. Pushdowns:

    Daft pushes down operations such as filtering, projections, and limits into the SQL query when possible. You can disable pushdowns by setting disable_pushdowns_to_sql=True, which will execute the SQL query as is.

Example

Read data from a SQL query and a database URL:

>>> df = daft.read_sql("SELECT * FROM my_table", "sqlite:///my_database.db")

Read data from a SQL query and a SQLAlchemy connection factory:

>>> def create_conn():
...     return sqlalchemy.create_engine("sqlite:///my_database.db").connect()
>>> df = daft.read_sql("SELECT * FROM my_table", create_conn)

Read data from a SQL query and partition the data by a column:

>>> df = daft.read_sql(
...     "SELECT * FROM my_table",
...     "sqlite:///my_database.db",
...     partition_col="id"
... )

Read data from a SQL query and partition the data into 3 partitions:

>>> df = daft.read_sql(
...     "SELECT * FROM my_table",
...     "sqlite:///my_database.db",
...     partition_col="id",
...     num_partitions=3
... )