daft.read_sql#
- daft.read_sql(sql: str, conn: Union[Callable[[], Connection], str], partition_col: Optional[str] = None, num_partitions: Optional[int] = None, disable_pushdowns_to_sql: bool = False) DataFrame [source]#
Creates a DataFrame from the results of a SQL query.
- Parameters:
sql (str) – SQL query to execute
conn (Union[Callable[[], Connection], str]) – SQLAlchemy connection factory or database URL
partition_col (Optional[str]) – Column to partition the data by, defaults to None
num_partitions (Optional[int]) – Number of partitions to read the data into, defaults to None, which will lets Daft determine the number of partitions.
disable_pushdowns_to_sql (bool) – Whether to disable pushdowns to the SQL query, defaults to False
- Returns:
Dataframe containing the results of the query
- Return type:
Note
- Supported databases:
Daft uses SQLGlot to build SQL queries, so it supports all databases that SQLGlot supports.
- Partitioning:
When
partition_col
is specified, the function partitions the query based on that column. You can definenum_partitions
or leave it to Daft to decide. Daft calculates the specified column’s percentiles to determine partitions (e.g., fornum_partitions=3
, it uses the 33rd and 66th percentiles). If the database or column type lacks percentile calculation support, Daft partitions the query using equal ranges between the column’s minimum and maximum values.
- Execution:
Daft executes SQL queries using using ConnectorX or SQLAlchemy, preferring ConnectorX unless a SQLAlchemy connection factory is specified or the database dialect is unsupported by ConnectorX.
- Pushdowns:
Daft pushes down operations such as filtering, projections, and limits into the SQL query when possible. You can disable pushdowns by setting
disable_pushdowns_to_sql=True
, which will execute the SQL query as is.
Example
Read data from a SQL query and a database URL:
>>> df = daft.read_sql("SELECT * FROM my_table", "sqlite:///my_database.db")
Read data from a SQL query and a SQLAlchemy connection factory:
>>> def create_conn(): ... return sqlalchemy.create_engine("sqlite:///my_database.db").connect() >>> df = daft.read_sql("SELECT * FROM my_table", create_conn)
Read data from a SQL query and partition the data by a column:
>>> df = daft.read_sql( ... "SELECT * FROM my_table", ... "sqlite:///my_database.db", ... partition_col="id" ... )
Read data from a SQL query and partition the data into 3 partitions:
>>> df = daft.read_sql( ... "SELECT * FROM my_table", ... "sqlite:///my_database.db", ... partition_col="id", ... num_partitions=3 ... )