daft.DataFrame#
- class DataFrame(builder: LogicalPlanBuilder)[source]#
A Daft DataFrame is a table of data. It has columns, where each column has a type and the same number of items (rows) as all other columns.
- __init__(builder: LogicalPlanBuilder) None [source]#
Constructs a DataFrame according to a given LogicalPlan. Users are expected instead to call the classmethods on DataFrame to create a DataFrame.
- Parameters:
plan – LogicalPlan describing the steps required to arrive at this DataFrame
Methods
__init__
(builder)Constructs a DataFrame according to a given LogicalPlan.
agg
(*to_agg)Perform aggregations on this DataFrame.
agg_concat
(*cols)Performs a global list concatenation agg on the DataFrame
agg_list
(*cols)Performs a global list agg on the DataFrame
any_value
(*cols)Returns an arbitrary value on this DataFrame.
collect
([num_preview_rows])Executes the entire DataFrame and materializes the results
concat
(other)Concatenates two DataFrames together in a "vertical" concatenation.
count
(*cols)Performs a global count on the DataFrame
count_rows
()Executes the Dataframe to count the number of rows.
distinct
()Computes unique rows, dropping duplicates
drop_nan
(*cols)Drops rows that contains NaNs.
drop_null
(*cols)Drops rows that contains NaNs or NULLs.
exclude
(*names)Drops columns from the current DataFrame by name
explain
([show_all, format, simple, file])Prints the (logical and physical) plans that will be executed to produce this DataFrame.
explode
(*columns)Explodes a List column, where every element in each row's List becomes its own row, and all other columns in the DataFrame are duplicated across rows
filter
(predicate)Filters rows via a predicate expression, similar to SQL
WHERE
.groupby
(*group_by)Performs a GroupBy on the DataFrame for aggregation
into_partitions
(num)Splits or coalesces DataFrame to
num
partitions.iter_partitions
([results_buffer_size])Begin executing this dataframe and return an iterator over the partitions.
iter_rows
([results_buffer_size])Return an iterator of rows for this dataframe.
join
(other[, on, left_on, right_on, how, ...])Column-wise join of the current DataFrame with an
other
DataFrame, similar to a SQLJOIN
limit
(num)Limits the rows in the DataFrame to the first
N
rows, similar to a SQLLIMIT
max
(*cols)Performs a global max on the DataFrame
mean
(*cols)Performs a global mean on the DataFrame
melt
(ids[, values, variable_name, value_name])Alias for unpivot
min
(*cols)Performs a global min on the DataFrame
num_partitions
()pivot
(group_by, pivot_col, value_col, agg_fn)Pivots a column of the DataFrame and performs an aggregation on the values.
repartition
(num, *partition_by)Repartitions DataFrame to
num
partitionssample
(fraction[, with_replacement, seed])Samples a fraction of rows from the DataFrame
schema
()Returns the Schema of the DataFrame, which provides information about each column
select
(*columns)Creates a new DataFrame from the provided expressions, similar to a SQL
SELECT
show
([n])Executes enough of the DataFrame in order to display the first
n
rowssort
(by[, desc])Sorts DataFrame globally
stddev
(*cols)Performs a global standard deviation on the DataFrame
sum
(*cols)Performs a global sum on the DataFrame
to_arrow
()Converts the current DataFrame to a pyarrow Table.
to_arrow_iter
([results_buffer_size])Return an iterator of pyarrow recordbatches for this dataframe.
to_dask_dataframe
([meta])Converts the current Daft DataFrame to a Dask DataFrame.
to_pandas
([coerce_temporal_nanoseconds])Converts the current DataFrame to a pandas DataFrame.
Converts the current DataFrame to a python dictionary.
Converts the current Dataframe into a python list. .. WARNING::.
Converts the current DataFrame to a Ray Dataset which is useful for running distributed ML model training in Ray
Convert the current DataFrame into a Torch IterableDataset for use with PyTorch.
Convert the current DataFrame into a map-style Torch Dataset for use with PyTorch.
transform
(func, *args, **kwargs)Apply a function that takes and returns a DataFrame.
unpivot
(ids[, values, variable_name, value_name])Unpivots a DataFrame from wide to long format.
where
(predicate)Filters rows via a predicate expression, similar to SQL
WHERE
.with_column
(column_name, expr[, ...])Adds a column to the current DataFrame with an Expression, equivalent to a
select
with all current columns and the new onewith_columns
(columns[, resource_request])Adds columns to the current DataFrame with Expressions, equivalent to a
select
with all current columns and the new oneswrite_csv
(root_dir[, partition_cols, io_config])Writes the DataFrame as CSV files, returning a new DataFrame with paths to the files that were written
write_deltalake
(table[, partition_cols, ...])Writes the DataFrame to a Delta Lake table, returning a new DataFrame with the operations that occurred.
write_iceberg
(table[, mode])Writes the DataFrame to an Iceberg table, returning a new DataFrame with the operations that occurred.
write_lance
(uri[, mode, io_config])Writes the DataFrame to a Lance table .
write_parquet
(root_dir[, compression, ...])Writes the DataFrame as parquet files, returning a new DataFrame with paths to the files that were written
Attributes
Returns column names of DataFrame as a list of strings.
columns
Returns column of DataFrame as a list of Expressions.