daft.read_sql

Contents

daft.read_sql#

daft.read_sql(sql: str, conn: Union[Callable[[], Connection], str], partition_col: Optional[str] = None, num_partitions: Optional[int] = None) DataFrame[source]#

Creates a DataFrame from the results of a SQL query.

Parameters:
  • sql (str) – SQL query to execute

  • conn (Union[Callable[[], Connection], str]) – SQLAlchemy connection factory or database URL

  • partition_col (Optional[str]) – Column to partition the data by, defaults to None

  • num_partitions (Optional[int]) – Number of partitions to read the data into, defaults to None, which will lets Daft determine the number of partitions.

Returns:

Dataframe containing the results of the query

Return type:

DataFrame

Note

  1. Partitioning:

    When partition_col is specified, the function partitions the query based on that column. You can define num_partitions or leave it to Daft to decide. Daft calculates the specified column’s percentiles to determine partitions (e.g., for num_partitions=3, it uses the 33rd and 66th percentiles). If the database or column type lacks percentile calculation support, Daft partitions the query using equal ranges between the column’s minimum and maximum values.

  2. Execution:

    Daft executes SQL queries using using ConnectorX or SQLAlchemy, preferring ConnectorX unless a SQLAlchemy connection factory is specified or the database dialect is unsupported by ConnectorX.

Example

Read data from a SQL query and a database URL:

>>> df = daft.read_sql("SELECT * FROM my_table", "sqlite:///my_database.db")

Read data from a SQL query and a SQLAlchemy connection factory:

>>> def create_conn():
...     return sqlalchemy.create_engine("sqlite:///my_database.db").connect()
>>> df = daft.read_sql("SELECT * FROM my_table", create_conn)

Read data from a SQL query and partition the data by a column:

>>> df = daft.read_sql(
...     "SELECT * FROM my_table",
...     "sqlite:///my_database.db",
...     partition_col="id"
... )

Read data from a SQL query and partition the data into 3 partitions:

>>> df = daft.read_sql(
...     "SELECT * FROM my_table",
...     "sqlite:///my_database.db",
...     partition_col="id",
...     num_partitions=3
... )