Daft 0.0.20 Release Notes
Contents
Daft 0.0.20 Release Notes#
The Daft 0.0.20 release adds many exciting new features and refactors. The highlights are:
Dynamic Scheduling
Rust Builds
New Features#
Dynamic Scheduling#
Great work by @xcharleslin to introduce a new Dynamic scheduler for running Daft dataframe queries. This version of the scheduler allows Daft to perform much more intelligent pull-based scheduling, dramatically speeding up operations such as limit(5).show()
which will now only require materializing the first partition, even in the presence of other global operations such as a where
.
Add tests for #368 #375
Implement dynamic scheduling. #387
Genericize PartitionT in DynamicSchedule components #408
The Dynamic scheduler is under active development and will be the default scheduler in the next release. You may use the dynamic scheduler locally with the environment flag DAFT_RUNNER=dynamic
.
See: #204
Rust Builds#
Foundational work by @sammysidhu to move Daft’s internals to Rust. This release moves our current custom C++ kernels to Rust, already yielding some speedups in local benchmarking. More refactors to Rust are planned for the next release, providing increased type-safety and execution efficiency for Daft users.
Move to Rust from C++ for internal compute #385
Move rust kernels to module #389
Rust Wheel Publishing to PYPI and Anaconda Nightly #388
build rust library for building docs #391
Drop Poetry for doc building #390
[Rust wheel publish] Set manylinux version to auto for max compatibility #395
Add daft version from rust build #397
Fix build for Ray compatibility CI #402
Clean up Ray compatiblity CI jobs Ray and Protobuf installations #407
Enhancements#
Add
memory_bytes
toResourceRequest
and pass into Ray. #368Move evaluation of UDF expression to ExpressionExecutor #380
Refactors to isolate code that uses PyArrow into vPartition #383
move RayPartitionSet import out of DataFrame top level and into ray_dataset_conversion #392
Fix retrieval of daft version in benchmarking suite #396
File read refactors #400
Remove caching from filesystems and benchmarking #411
Bug Fixes#
Pin numpy < 1.24 to avoid partition bug for pylist #399