Announcing our $30M Series A Funding and Early Access Sign Up 🎉

A high-performance data engine providing simple and reliable data processing for any modality and scale.

Any Modality. Radical potential.

pip install daft

A high-performance data engine providing simple and reliable data processing for any modality and scale.

pip install daft

Unified multimodal data processing

Break down data silos with a single framework that handles structured tables, unstructured text, and rich media like images—all with the same intuitive API. Why juggle multiple tools when one can do it all?

Python-native, no JVM required

Built for modern AI/ML workflows with Python at its core and Rust under the hood. Skip the JVM complexity, version conflicts, and memory tuning to achieve 20x faster start times—get the performance without the Java tax.

Seamless scaling, from laptop to cluster

Start local, scale global—without changing a line of code. Daft's Rust-powered engine delivers blazing performance on a single machine and effortlessly extends to distributed clusters when you need more horsepower.

Unified multimodal data processing

Python-native, no JVM required

Seamless scaling, from laptop to cluster

Use Cases

Large Scale Document Processing

This example demonstrates Daft's ability to seamlessly integrate with AI models to create text embeddings at scale.
Using a pre-trained transformer model, Daft processes large document collections stored in cloud storage, converting text into embeddings using easy-to-use user-defined functions for downstream AI applications like semantic search or clustering.

See Examples

[1]

Native Multimodal Processing

Process any data type—from structured tables to unstructured text and rich media—with native support for images, embeddings, and tensors in a single, unified framework.

[2]

Rust-Powered Performance

Experience breakthrough speed with our Rust foundation delivering vectorized execution and non-blocking I/O that processes the same queries with 5x less memory while consistently outperforming industry standards by an order of magnitude.

[3]

Seamless ML Ecosystem Integration

Slot directly into your existing ML workflows with zero friction—whether you're using PyTorch, NumPy, Pandas, or HuggingFace models, Daft works where you work.

[4]

Universal Data Connectivity

Access data anywhere it lives—cloud storage (S3, Azure, GCS), modern table formats (Iceberg, Delta Lake, Hudi), or enterprise catalogs (Unity, AWS Glue)—all with zero configuration.

[5]

Push Your Code to Your Data

Bring your Python functions directly to your data with zero-copy UDFs powered by Apache Arrow, eliminating data movement overhead and accelerating processing speeds.

[6]

Out of the Box Reliability

Deploy with confidence—intelligent memory management prevents OOM errors while sensible defaults eliminate configuration headaches, letting you focus on results, not infrastructure.

trusted by

“Daft was incredible at large volumes of abnormally shaped workloads - I pointed it at 16,000 small Parquet files in a self-hosted S3 service and it just worked! It's the data engine built for the cloud and AI workloads.“

Tony Wang

Data @ Anthropic, PhD @ Stanford

“Amazon uses Daft to manage exabytes of Apache Parquet in our Amazon S3-based data catalog. Daft improved the efficiency of one of our most critical data processing jobs by over 24%, saving over 40,000 years of Amazon EC2 vCPU computing time annually.”

Patrick Ames

Principal Engineer @ Amazon

“Daft has dramatically improved our 100TB+ text data pipelines, speeding up workloads such as fuzzy deduplication by 10x. Jobs previously built using custom code on Ray/Polars has been replaced by simple Daft queries, running on internet-scale unstructured datasets.”

Maurice Weber

PhD AI Researcher @ Together AI

“Daft as an alternative to Spark has changed the way we think about data on our ML Platform. Its tight integrations with Ray lets us maintain a unified set of infrastructure while improving both query performance and developer productivity. Less is more.“

Alexander Filipchik

Head Of Infrastructure at City Storage Systems (CloudKitchens)