Daft 0.0.23 Release Notes#

The Daft 0.0.23 release has some amazing highlights:

  • Our Dynamic runners are now the default runners thanks to @xcharleslin

  • Our first PR from @Asrst - added DataFrame.count_rows to get the length of a DataFrame!

New Features#

DataFrame.count_rows()#

Adds an API to count the number of rows on a DataFrame, and a len method. Thanks to @Asrst for the PR!

See: #553

Dynamic Runners as default runners#

Our Dynamic Ray and Py runners are now the default runners, giving us much faster execution times!

See: #564

Enhancements#

  • Unit testing for dataframe accessor methods #557

  • Add multithreading to .url.download() #601

  • Add multithreading to PyRunner #599

  • Rust Expressions / Table integration into Python #598

  • Join microbenchmarks #597

  • Microbenchmarks: Add csv read. Parameterize aggregations. #595

  • Allow for creation of DataFrames from numpy/Pyarrow arrays in .from_pydict #587

  • Refactor code to retrieve visualization logic from a registry #583

  • Propagate file size metadata to ResourceRequests. #577

  • Run code coverage per process to get better ray coverage #566

  • Fix style CI isort issue #563

  • [rust] PyDataTypes and casting operators #562

  • Better testing and validation of DataFrame.from_py* constructors #558

  • Naming and class refactors for physical plan #538

  • [rust] Series including kernels for comparisons, arithmetic and broadcasting, etc #531

  • Refactor @udf to use type annotations #560

Bug Fixes#

  • Fix @udf type inference for class-based stateful UDFs #573

  • Fix join bug with empty PyArrow chunked arrays #606

  • Drop numpy split for pylist block and enhance pretty printing for explain() #604

  • Fixes Joins when we have duplicate names from each source #592

  • Optimizer tests #590

  • Fixes Join Optimizer Bug #589

  • Fix reading CSV/JSON errors from https #581

  • DynamicRayRunner: Worker thrashing fixes #554

Documentation#

  • Cleanup RTD docs build #586

  • Cleanup UDF docs #585

  • Fix readthedocs yaml to do maturin develop #584

  • Fix HTML link in changelog #576

  • Fix tutorial notebooks #575

  • Cleanup static assets only used in landing page #570

  • Remove github pages publishing workflow #569

  • Fix links to point to correct subdomain #568

  • Docs refactor #567

Closed Issues#

  • Fix self-joins wrongly omitting columns #442

  • Raise exception instead of ARROW_CHECK when running C++ kernels #266

  • Stack based scheduling for dynamic runtime #352

  • Doc versioning on getdaft.io #255

  • AssertionError: too many colons found - when trying to read from http links #579

  • Colab notebook broken link #574

  • More flexible UDF input types using type annotation #559

  • NameError: name ‘_X5ix_PYCCOLO_TRACING_ENABLED’ is not defined with @polars_udf #514

  • M1 Mac installation issue -ImportError: dlopen #513

  • DataFrame.from_pylist error message #507

  • Get length of dataframe #358