Daft: The Distributed Python Dataframe#
Daft is a fast and scalable Python dataframe for Complex Data and Machine Learning workloads.
Installing Daft is simple with
Daft is open-sourced and you can use any Python library when processing data in a dataframe. It integrates with many other open-sourced technologies as well, plugging directly into your current infrastructure and systems.
Data Science and Machine Learning
Data Science Experimentation
Daft enables data scientists/engineers to work from their preferred Python notebook environment for interactive experimentation on complex data
Complex Data Warehousing
The Daft Python dataframe efficiently pipelines complex data from raw data lakes to clean, queryable datasets for analysis and reporting.
Machine Learning Training Dataset Curation
Machine Learning Model Evaluation
Evaluating the performance of machine learning systems is challenging, but Daft Python dataframes make it easy to run models and SQL-style analyses at scale.
Daft supports running User-Defined Functions (UDF) on columns of Python objects - if Python supports it Daft can handle it!
Daft embraces Python's dynamic and interactive nature, enabling fast, iterative experimentation on data in your notebook and on your laptop.
Daft integrates with frameworks such as Ray to run large petabyte-scale dataframes on a cluster of machines in the cloud.