Daft Documentation#
Daft is a distributed query engine for large-scale data processing in Python and is implemented in Rust.
Familiar interactive API: Lazy Python Dataframe for rapid and interactive iteration
Focus on the what: Powerful Query Optimizer that rewrites queries to be as efficient as possible
Data Catalog integrations: Full integration with data catalogs such as Apache Iceberg
Rich multimodal type-system: Supports multimodal types such as Images, URLs, Tensors and more
Seamless Interchange: Built on the Apache Arrow In-Memory Format
Built for the cloud: Record-setting I/O performance for integrations with S3 cloud storage
Installing Daft#
To install Daft, run this from your terminal:
pip install getdaft
Learn about other more advanced installation options in our Installation Guide.
Learning Daft#
Quickstart Notebook: up-and-running with Daft in less than 10 minutes!
Daft User Guide: useful learning material to learn key Daft concepts
Daft API Documentation: Python API documentation for reference
Frequently Asked Questions#
How do I know if Daft is the right framework for me?: Dataframe Comparison
How does Daft perform at large scales vs other data engines?: Benchmarks
What is the technical architecture of Daft?: Technical Architecture
Does Daft perform any telemetry?: Telemetry