daft.DataFrame.join

Contents

daft.DataFrame.join#

DataFrame.join(other: DataFrame, on: Optional[Union[List[Union[Expression, str]], Expression, str]] = None, left_on: Optional[Union[List[Union[Expression, str]], Expression, str]] = None, right_on: Optional[Union[List[Union[Expression, str]], Expression, str]] = None, how: str = 'inner', strategy: Optional[str] = None) DataFrame[source]#

Column-wise join of the current DataFrame with an other DataFrame, similar to a SQL JOIN

Note

Although self joins are supported, we currently duplicate the logical plan for the right side and recompute the entire tree. Caching for this is on the roadmap.

Parameters:
  • other (DataFrame) – the right DataFrame to join on.

  • on (Optional[Union[List[ColumnInputType], ColumnInputType]], optional) – key or keys to join on [use if the keys on the left and right side match.]. Defaults to None.

  • left_on (Optional[Union[List[ColumnInputType], ColumnInputType]], optional) – key or keys to join on left DataFrame.. Defaults to None.

  • right_on (Optional[Union[List[ColumnInputType], ColumnInputType]], optional) – key or keys to join on right DataFrame. Defaults to None.

  • how (str, optional) – what type of join to performing, currently only inner is supported. Defaults to “inner”.

  • strategy (Optional[str]) – The join strategy (algorithm) to use; currently “hash”, “sort_merge”, “broadcast”, and None are supported, where None chooses the join strategy automatically during query optimization. The default is None.

Raises:
  • ValueError – if on is passed in and left_on or right_on is not None.

  • ValueError – if on is None but both left_on and right_on are not defined.

Returns:

Joined DataFrame.

Return type:

DataFrame