Daft Documentation#
Daft is a unified data engine for data engineering, analytics and ML/AI.
Daft exposes both SQL and Python DataFrame interfaces as first-class citizens and is written in Rust.
Daft provides a snappy and delightful local interactive experience, but also seamlessly scales to petabyte-scale distributed workloads.
Use-Cases#
Data Engineering#
Combine the performance of DuckDB, Pythonic UX of Polars and scalability of Apache Spark for data engineering from MB to PB scale
Scale ETL workflows effortlessly from local to distributed environments
Enjoy a Python-first experience without JVM dependency hell
Leverage native integrations with cloud storage, open catalogs, and data formats
Data Analytics#
Blend the snappiness of DuckDB with the scalability of Spark/Trino for unified local and distributed analytics
Utilize complementary SQL and Python interfaces for versatile analytics
Perform snappy local exploration with DuckDB-like performance
Seamlessly scale to the cloud, outperforming distributed engines like Spark and Trino
ML/AI#
Streamline ML/AI workflows with efficient dataloading from open formats like Parquet and JPEG
Load data efficiently from open formats directly into PyTorch or NumPy
Schedule large-scale model batch inference on distributed GPU clusters
Optimize data curation with advanced clustering, deduplication, and filtering
Technology#
Daft boasts strong integrations with technologies common across these workloads:
Cloud Object Storage: Record-setting I/O performance for integrations with S3 cloud storage, battle-tested at exabyte-scale at Amazon
ML/AI Python Ecosystem: first-class integrations with PyTorch and NumPy for efficient interoperability with your ML/AI stack
Data Catalogs/Table Formats: capabilities to effectively query table formats such as Apache Iceberg, Delta Lake and Apache Hudi
Seamless Data Interchange: zero-copy integration with Apache Arrow
Multimodal/ML Data: native functionality for data modalities such as tensors, images, URLs, long-form text and embeddings
Installing Daft#
To install Daft, run this from your terminal:
pip install getdaft
Learn about other more advanced installation options in our Installation Guide.
Learning Daft#
Quickstart Notebook: up-and-running with Daft in less than 10 minutes!
Daft User Guide: useful learning material to learn key Daft concepts
Daft API Documentation: Python API documentation for reference
Frequently Asked Questions#
How do I know if Daft is the right framework for me?: Dataframe Comparison
How does Daft perform at large scales vs other data engines?: Benchmarks
What is the technical architecture of Daft?: Technical Architecture
Does Daft perform any telemetry?: Telemetry