Daft 0.0.19 Release Notes
Contents
Daft 0.0.19 Release Notes#
The Daft 0.0.19 release packs a bunch of new features and bugfixes from our new contributors! The highlights are:
New list aggregation for aggregating items into a list (#346)
New Features#
New List aggregation#
Users can now groupby and then aggregate each group into a Python list!
See: #346
Enhancements#
Add visualizations to Dataframe repr #359
Allow subscripting of GroupedDataFrame to access its columns #285
Support wider ray version range in requirements #234
Rename
from_parquet
andfrom_csv
toread_*
, deprecate the former #218Use a simple disk-based cache for remote file scans #329
Fix daft install during cluster warm-up #341
Cache files locally during setup phase in benchmarking #330
Add pipelined script for generating parquet files in s3 #328
Fix broken links in documentation using relative links #327
Add new benchmarking fields and remove –output_csv_headers #326
Fix Broken Link Checker #323
Rename “unstructured” data to “complex” data #321
Bug Fixes#
.show on an empty dataframe should return a friendlier output #307
Fix DataFrame.show() display of null integers #241
Fix DataFrameDisplay to take in a vPartition instead of pandas dataframe #334
Drop use of backspace to render explain correctly in notebook #362
add drop projections pass to drop no-op projections #349
Add support for merging NullType Arrowblocks with regular ArrowTypes #343
Support empty dataframes, with and without schema info. #342
Build Changes#
Daft now is tested against a matrix of Ray versions:
Pin Daft requirements to Ray >= 1.10.0 #337
Add CI nightly job for checking compatibility with a list of Ray versions #336
Added nightly builds!
Publish nightly releases #354
Associated PRs:
Deprecations#
from_parquet
andfrom_csv
are deprecated in favor ofread_parquet
andread_csv
(#218)