- class daft.DataFrame(builder: daft.logical.builder.LogicalPlanBuilder)#
A Daft DataFrame is a table of data. It has columns, where each column has a type and the same number of items (rows) as all other columns.
- __init__(builder: daft.logical.builder.LogicalPlanBuilder) None #
Constructs a DataFrame according to a given LogicalPlan. Users are expected instead to call the classmethods on DataFrame to create a DataFrame.
plan – LogicalPlan describing the steps required to arrive at this DataFrame
Constructs a DataFrame according to a given LogicalPlan.
Perform aggregations on this DataFrame.
Performs a global list concatenation agg on the DataFrame
Performs a global list agg on the DataFrame
Executes the entire DataFrame and materializes the results
Concatenates two DataFrames together in a "vertical" concatenation.
Performs a global count on the DataFrame
Executes the Dataframe to count the number of rows.
Computes unique rows, dropping duplicates
drops rows that contains NaNs.
drops rows that contains NaNs or NULLs.
Drops columns from the current DataFrame by name
Prints the logical plan that will be executed to produce this DataFrame.
Explodes a List column, where every element in each row's List becomes its own row, and all other columns in the DataFrame are duplicated across rows
Performs a GroupBy on the DataFrame for aggregation
Splits or coalesces DataFrame to
Begin executing this dataframe and return an iterator over the partitions.
join(other[, on, left_on, right_on, how])
Column-wise join of the current DataFrame with an
otherDataFrame, similar to a SQL
Limits the rows in the DataFrame to the first
Nrows, similar to a SQL
Performs a global max on the DataFrame
Performs a global mean on the DataFrame
Performs a global min on the DataFrame
Repartitions DataFrame to
Returns the Schema of the DataFrame, which provides information about each column
Creates a new DataFrame from the provided expressions, similar to a SQL
Executes enough of the DataFrame in order to display the first
Sorts DataFrame globally
Performs a global sum on the DataFrame
Converts the current DataFrame to a pyarrow Table.
Converts the current Daft DataFrame to a Dask DataFrame.
Converts the current DataFrame to a pandas DataFrame.
Converts the current DataFrame to a python dictionary.
Converts the current DataFrame to a Ray Dataset which is useful for running distributed ML model training in Ray
Convert the current DataFrame into a Torch IterableDataset for use with PyTorch.
Convert the current DataFrame into a map-style Torch Dataset for use with PyTorch.
Filters rows via a predicate expression, similar to SQL
with_column(column_name, expr[, ...])
Adds a column to the current DataFrame with an Expression, equivalent to a
selectwith all current columns and the new one
Writes the DataFrame as CSV files, returning a new DataFrame with paths to the files that were written
write_parquet(root_dir[, compression, ...])
Writes the DataFrame as parquet files, returning a new DataFrame with paths to the files that were written
Returns column names of DataFrame as a list of strings.
Returns column of DataFrame as a list of Expressions.