daft.DataFrame#

class daft.DataFrame(builder: LogicalPlanBuilder)[source]#

A Daft DataFrame is a table of data. It has columns, where each column has a type and the same number of items (rows) as all other columns.

__init__(builder: LogicalPlanBuilder) → None[source]#

Constructs a DataFrame according to a given LogicalPlan. Users are expected instead to call the classmethods on DataFrame to create a DataFrame.

Parameters:: plan – LogicalPlan describing the steps required to arrive at this DataFrame

Methods

`__init__`(builder)	Constructs a DataFrame according to a given LogicalPlan.
`agg`(*to_agg)	Perform aggregations on this DataFrame.
`agg_concat`(*cols)	Performs a global list concatenation agg on the DataFrame
`agg_list`(*cols)	Performs a global list agg on the DataFrame
`any_value`(*cols)	Returns an arbitrary value on this DataFrame.
`collect`([num_preview_rows])	Executes the entire DataFrame and materializes the results
`concat`(other)	Concatenates two DataFrames together in a "vertical" concatenation.
`count`(*cols)	Performs a global count on the DataFrame
`count_rows`()	Executes the Dataframe to count the number of rows.
`distinct`()	Computes unique rows, dropping duplicates
`drop_nan`(*cols)	drops rows that contains NaNs.
`drop_null`(*cols)	drops rows that contains NaNs or NULLs.
`exclude`(*names)	Drops columns from the current DataFrame by name
`explain`([show_all, simple])	Prints the (logical and physical) plans that will be executed to produce this DataFrame.
`explode`(*columns)	Explodes a List column, where every element in each row's List becomes its own row, and all other columns in the DataFrame are duplicated across rows
`groupby`(*group_by)	Performs a GroupBy on the DataFrame for aggregation
`into_partitions`(num)	Splits or coalesces DataFrame to `num` partitions.
`iter_partitions`()	Begin executing this dataframe and return an iterator over the partitions.
`join`(other[, on, left_on, right_on, how, ...])	Column-wise join of the current DataFrame with an `other` DataFrame, similar to a SQL `JOIN`
`limit`(num)	Limits the rows in the DataFrame to the first `N` rows, similar to a SQL `LIMIT`
`max`(*cols)	Performs a global max on the DataFrame
`mean`(*cols)	Performs a global mean on the DataFrame
`min`(*cols)	Performs a global min on the DataFrame
`num_partitions`()
`repartition`(num, *partition_by)	Repartitions DataFrame to `num` partitions
`sample`(fraction[, with_replacement, seed])	Samples a fraction of rows from the DataFrame
`schema`()	Returns the Schema of the DataFrame, which provides information about each column
`select`(*columns)	Creates a new DataFrame from the provided expressions, similar to a SQL `SELECT`
`show`([n])	Executes enough of the DataFrame in order to display the first `n` rows
`sort`(by[, desc])	Sorts DataFrame globally
`sum`(*cols)	Performs a global sum on the DataFrame
`to_arrow`([cast_tensors_to_ray_tensor_dtype])	Converts the current DataFrame to a pyarrow Table.
`to_dask_dataframe`([meta])	Converts the current Daft DataFrame to a Dask DataFrame.
`to_pandas`([cast_tensors_to_ray_tensor_dtype])	Converts the current DataFrame to a pandas DataFrame.
`to_pydict`()	Converts the current DataFrame to a python dictionary.
`to_ray_dataset`()	Converts the current DataFrame to a Ray Dataset which is useful for running distributed ML model training in Ray
`to_torch_iter_dataset`()	Convert the current DataFrame into a Torch IterableDataset for use with PyTorch.
`to_torch_map_dataset`()	Convert the current DataFrame into a map-style Torch Dataset for use with PyTorch.
`where`(predicate)	Filters rows via a predicate expression, similar to SQL `WHERE`.
`with_column`(column_name, expr[, ...])	Adds a column to the current DataFrame with an Expression, equivalent to a `select` with all current columns and the new one
`write_csv`(root_dir[, partition_cols, io_config])	Writes the DataFrame as CSV files, returning a new DataFrame with paths to the files that were written
`write_iceberg`(table[, mode])	Writes the DataFrame to an Iceberg Table, returning a new DataFrame with the operations that occurred.
`write_parquet`(root_dir[, compression, ...])	Writes the DataFrame as parquet files, returning a new DataFrame with paths to the files that were written

Attributes

`column_names`	Returns column names of DataFrame as a list of strings.
`columns`	Returns column of DataFrame as a list of Expressions.

daft.DataFrame

Contents

daft.DataFrame#