Skip to content

Daft Sessions#

Warning

These APIs are early in their development. Please feel free to open feature requests and file issues. We'd love hear want you would like, thank you! 🤘

Sessions enable you to attach catalogs, tables, and create temporary objects which are accessible through both the Python and SQL APIs. Sessions hold configuration state such as current_catalog and current_namespace which are used in name resolution and can simplify your workflows.

Example#

import daft

# `import daft` defines an implicit session `daft.current_session()`

from daft import Session

# create a new session
sess = Session()

# create a temp table from a DataFrame
sess.create_temp_table("T", daft.from_pydict({ "x": [1,2,3] }))

# read table as dataframe from the session
_ = sess.read_table("T")

# get the table instance
t = sess.get_table("T")

# read table instance as a datadrame
_ = t.read()

# execute sql against the session
sess.sql("SELECT * FROM T").show()
╭───────╮
 x     
 ---   
 Int64 
╞═══════╡
 1     
├╌╌╌╌╌╌╌┤
 2     
├╌╌╌╌╌╌╌┤
 3     
╰───────╯

Usage#

This section covers detailed usage of the current APIs with some code snippets.

Setup#

Note

For these examples, we are using sqlite Iceberg which requires pyiceberg[sql-sqlite].

from daft import Catalog
from pyiceberg.catalog.sql import SqlCatalog

# don't forget to `mkdir -p /tmp/daft/example`
tmpdir = "/tmp/daft/example"

# create a pyiceberg catalog backed by sqlite
iceberg_catalog = SqlCatalog(
    "default",
    **{
        "uri": f"sqlite:///{tmpdir}/catalog.db",
        "warehouse": f"file://{tmpdir}",
    },
)

# creating a daft catalog from the pyiceberg catalog implementation
catalog = Catalog.from_iceberg(iceberg_catalog)

# check
catalog.name
"""
'default'
"""

Session State#

Let's get started by creating an empty session and checking the state.

import daft

from daft import Session

# create a new empty session
sess = Session()

# check the current catalog (None)
sess.current_catalog()

# get the current namespace (None)
sess.current_namespace()

Attach & Detach#

The attach and detach methods make it easy to use catalogs and tables in a session. This example shows how we can attach our newly created catalog. When you attach a catalog to an empty session, it automatically becomes the current active catalog.

# attach makes it possible to use existing catalogs in the session
sess.attach(catalog)

# check the current catalog was set automatically
sess.current_catalog()
"""
Catalog('default')
"""

# detach would remove the catalog
# sess.detach_catalog("default")

Create & Drop#

We can create tables and namespaces directly through a catalog or via our session.

# create a namespace 'example'
sess.create_namespace("example")

# verify it was created
sess.list_namespaces()
"""
[Identifier('example')]
"""

# you can create an empty table with a schema
# sess.create_table("example.tbl", schema)

# but suppose we have some data..
df = daft.from_pydict({
    "x": [ True, True, False ],
    "y": [ 1, 2, 3 ],
    "z": [ "abc", "def", "ghi" ],
})

# create a table from the dataframe, which will create + append
sess.create_table("example.tbl", df)

# you can also create temporary tables from dataframes
# > echo "x,y,z\nFalse,4,jkl" > /tmp/daft/row.csv
sess.create_temp_table("temp", daft.read_csv("/tmp/daft/row.csv"))

# you can drop too
# sess.drop_table("example.tbl")
# sess.drop_namespace("example")

Read & Write#

Using sessions abstracts away underlying catalog and table implementations so you can easily read and write daft DataFrames.

# we can read our table back as a DataFrame instance
sess.read_table("example.tbl").show()
"""
╭─────────┬───────┬──────╮
│ x       ┆ y     ┆ z    │
│ ---     ┆ ---   ┆ ---  │
│ Boolean ┆ Int64 ┆ Utf8 │
╞═════════╪═══════╪══════╡
│ true    ┆ 1     ┆ abc  │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ true    ┆ 2     ┆ def  │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ false   ┆ 3     ┆ ghi  │
╰─────────┴───────┴──────╯
"""

# create a single row to append
row = daft.from_pylist([{ "x": True, "y": 4, "z": "jkl" }])

# we can write a DataFrame to the Table via the Session
sess.write_table("example.tbl", row, mode="append")

# we can use session state and table objects!
sess.set_namespace("example")

# name resolution is trivial
tbl = sess.get_table("tbl")

# to read, we have .read() .select(*cols) or .show()
tbl.show()
"""
╭─────────┬───────┬──────╮
│ x       ┆ y     ┆ z    │
│ ---     ┆ ---   ┆ ---  │
│ Boolean ┆ Int64 ┆ Utf8 │
╞═════════╪═══════╪══════╡
│ true    ┆ 4     ┆ jkl  │  <--- `row` was inserted
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ true    ┆ 1     ┆ abc  │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ true    ┆ 2     ┆ def  │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ false   ┆ 3     ┆ ghi  │
╰─────────┴───────┴──────╯
"""

# to write, we have .write(df, mode), .append(df), .overwrite(df)
tbl.append(row)

# row is now inserted twice
tbl.show()
"""
╭─────────┬───────┬──────╮
│ x       ┆ y     ┆ z    │
│ ---     ┆ ---   ┆ ---  │
│ Boolean ┆ Int64 ┆ Utf8 │
╞═════════╪═══════╪══════╡
│ true    ┆ 4     ┆ jkl  │ <-- append via tbl.append(...)
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ true    ┆ 4     ┆ jkl  │ <-- append via sess.write(...)
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ true    ┆ 1     ┆ abc  │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ true    ┆ 2     ┆ def  │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ false   ┆ 3     ┆ ghi  │
╰─────────┴───────┴──────╯
"""

Using SQL#

The session enables executing Daft SQL against your catalogs.

# use the given catalog and namespace (like the use/set_ methods)
sess.sql("USE default.example")

# we support both qualified and unqualified names by leveraging the session state
sess.sql("SELECT * FROM tbl LIMIT 1").show()
╭─────────┬───────┬──────╮
 x        y      z    
 ---      ---    ---  
 Boolean  Int64  Utf8 
╞═════════╪═══════╪══════╡
 true     4      jkl  
╰─────────┴───────┴──────╯

# we can even combine our queries with the temp table from earlier
sess.sql("SELECT * FROM example.tbl, temp LIMIT 1").show()
╭─────────┬───────┬──────┬────────────┬────────┬────────╮
 x        y      z                temp.y  temp.z 
 ---      ---    ---               ---     ---    
 Boolean  Int64  Utf8  (1 hidden)  Int64   Utf8   
╞═════════╪═══════╪══════╪════════════╪════════╪════════╡
 true     4      jkl              4       jkl    
╰─────────┴───────┴──────┴────────────┴────────┴────────╯

Note

We aim to support SQL DDL in future releases!

Reference#

For complete documentation, please see the Session API docs.

Method Description
attach Attaches a catalog or table to this session.
attach_catalog Attaches (or creates) a catalog to this session
attach_table Attaches (or creates) a table to this session
create_namespace Creates a new namespace
create_table Creates a new table from the source
create_temp_table Creates a temp table scoped to this session from an existing view.
current_catalog Returns the session's current catalog.
current_namespace Returns the session's current namespace.
detach_catalog Detaches the catalog from this session
detach_table Detaches the table from this session
drop_namespace Drop the namespace in the session's current catalog
drop_table Drop the table in the session's current catalog
get_catalog Returns the catalog or an object not found error.
get_table Returns the table or an object not found error.
has_catalog Returns true iff the session has access to a matching catalog.
has_table Returns true iff the session has access to a matching table.
list_catalogs Lists all catalogs matching the pattern.
list_namespaces Lists all namespaces matching the pattern.
list_tables Lists all tables matching the pattern.
read_table Reads a table from the session.
write_table Writes a dataframe to the table.
set_catalog Sets the current catalog.
set_namespace Sets the current namespace.
sql Executes SQL against the session.
use Sets the current catalog and namespace.