Daft 0.0.22 Release Notes
Contents
Daft 0.0.22 Release Notes#
The Daft 0.0.22 release adds much more testing and bug fixes. The highlights are:
Dynamic Runner speedups - our dynamic runners now outperform the existing runners and will be the default in the next release.
Property-based tests with Hypothesis for
.sort
Refactors to enable integration of Rust Series/vPartition/Expressions
Enhancements#
Always translate LocalLimit logical plan to its own physical plan. #539
Performance improvements for distributed Ray clusters. #537
Add resource request to dynamic runners #530
Refactor resource requests out of Expressions #528
Introduce Schema and Fields for logical plan schema rather than use ExpressionList #516
Refactor Expressions to drop column ids #508
DynamicRayRunner: increase inflight tasks; fix spread condition #494
Add DataFrame.from_glob_path, deprecating DataFrame.from_files #492
[Rust] Table and Series implementations that support arrow interops and expr evaluation #443
Bug Fixes#
Add filter step to hypothesis test and fix filter bug #551
Skip Ray resource request tests if ray version less than 2 #550
Fix Expression input_mapping query optimization bug #536
Remove walrus operators for Py 3.7 compatibility #489
[bugfix] Fix bisect left behavior when search sorting in reverse for utf8 and numeric arrays #547
[bugfix] Fix Behavior in Search Sorted (Partitioning) for Nulls and NaNs #545
Fix to_pydict to return python lists instead of arrow arrays #527
Fix self join where column name can conflict #521
Fix dynamic runners sorting #512
Fixes #299 to work without relying on expression IDs #510
Fix anonymous s3 file access in .url.download() #505
Add custom handling of s3 filesystem creation when no credentials are found #504
Fix Literal typing import for 3.7 compatibility #498
Dynamic runners: Assert against negative limit() #463
Better user messages on import errors of optional dependencies #462
Add handling of missing filepaths with FileNotFoundError #460
Add fix for duplicate URL downloads only filling out bytes for one row #361
Testing#
Better organize tests, separating optimizer tests from dataframe tests #552
Fix hypothesis test to assert that nulls greater than all values #549
[codecov] Enable Code Coverage for Rust From python tests #548
Add sort tests #544
Skip Ray runner tests in property based testing #543
Tests for dataframe repr and html repr #541
[Coverage] Update CodeCov Threshold to 1 percent #540
[Coverage] Batch upload of coverage files to CodeCov, ignore non-daft python files, update comment config for CodeCov #526
Add simple unit tests for Aggs #525
Fix hypothesis sort test #524
add rust to code coverage #518
Codecov Python Code Coverage #517
Property based testing #515
Add pytest benchmarking for benchmark suites with a simple agg test #455
Build Changes#
Fix Ray grpcio issues in CI #555
Fix cloud ray tutorial #542
Run tutorial notebooks in CI #532
Pin polars to versions <= 0.15.18 due to issue #6584 #523
Temporary workaround for issue #501 #502
downgrade python version in CI #500
Fix CI issue with pre-commit toml sorting #490
Disable telemetry in CI jobs #471
Daft Publishing: dont increment patch version to not clobber newer version of daft #456
Documentation#
Closed Issues#
Unable to access public S3 buckets without credentials #503
Run CI in Python 3.7 to ensure compatibility #499
Fix walrus usage for 3.7 compatibility #488
More informative ImportError on failed imports of optional dependencies #459
MNIST tutorial broken #534
Dynamic Runners should respect ResourceRequests #529
Dataframe tests for dataframe API #522
DataFrame.to_pydict() should produce Python objects, not Arrow objects #467
IndexError instead of FileNotFoundError when attempting to read from an invalid path #457
Check execution-time error presentation in new Runners #440
Run file listing for
DataFrame.from_files
inside the Runner #426Expanded datetime/interval expression support #373
Fix caching semantics #366
More String Expressions #333