Daft 0.0.23 Release Notes
Contents
Daft 0.0.23 Release Notes#
The Daft 0.0.23 release has some amazing highlights:
Our Dynamic runners are now the default runners thanks to @xcharleslin
Our first PR from @Asrst - added DataFrame.count_rows to get the length of a DataFrame!
New Features#
DataFrame.count_rows()#
Adds an API to count the number of rows on a DataFrame, and a len
method. Thanks to @Asrst for the PR!
See: #553
Dynamic Runners as default runners#
Our Dynamic Ray and Py runners are now the default runners, giving us much faster execution times!
See: #564
Enhancements#
Unit testing for dataframe accessor methods #557
Add multithreading to .url.download() #601
Add multithreading to PyRunner #599
Rust Expressions / Table integration into Python #598
Join microbenchmarks #597
Microbenchmarks: Add csv read. Parameterize aggregations. #595
Allow for creation of DataFrames from numpy/Pyarrow arrays in .from_pydict #587
Refactor code to retrieve visualization logic from a registry #583
Propagate file size metadata to ResourceRequests. #577
Run code coverage per process to get better ray coverage #566
Fix style CI isort issue #563
[rust] PyDataTypes and casting operators #562
Better testing and validation of DataFrame.from_py* constructors #558
Naming and class refactors for physical plan #538
[rust] Series including kernels for comparisons, arithmetic and broadcasting, etc #531
Refactor @udf to use type annotations #560
Bug Fixes#
Fix @udf type inference for class-based stateful UDFs #573
Fix join bug with empty PyArrow chunked arrays #606
Drop numpy split for pylist block and enhance pretty printing for explain() #604
Fixes Joins when we have duplicate names from each source #592
Optimizer tests #590
Fixes Join Optimizer Bug #589
Fix reading CSV/JSON errors from https #581
DynamicRayRunner: Worker thrashing fixes #554
Documentation#
Cleanup RTD docs build #586
Cleanup UDF docs #585
Fix readthedocs yaml to do maturin develop #584
Fix HTML link in changelog #576
Fix tutorial notebooks #575
Cleanup static assets only used in landing page #570
Remove github pages publishing workflow #569
Fix links to point to correct subdomain #568
Docs refactor #567
Closed Issues#
Fix self-joins wrongly omitting columns #442
Raise exception instead of
ARROW_CHECK
when running C++ kernels #266Stack based scheduling for dynamic runtime #352
Doc versioning on getdaft.io #255
AssertionError: too many colons found - when trying to read from http links #579
Colab notebook broken link #574
More flexible UDF input types using type annotation #559
NameError: name ‘_X5ix_PYCCOLO_TRACING_ENABLED’ is not defined with @polars_udf #514
M1 Mac installation issue -ImportError: dlopen #513
DataFrame.from_pylist error message #507
Get length of dataframe #358