


DataFrame.join(other: DataFrame, on: Optional[Union[List[Union[Expression, str]], Expression, str]] = None, left_on: Optional[Union[List[Union[Expression, str]], Expression, str]] = None, right_on: Optional[Union[List[Union[Expression, str]], Expression, str]] = None, how: str = 'inner', strategy: Optional[str] = None) DataFrame[source]#

Column-wise join of the current DataFrame with an other DataFrame, similar to a SQL JOIN


Although self joins are supported, we currently duplicate the logical plan for the right side and recompute the entire tree. Caching for this is on the roadmap.

  • other (DataFrame) – the right DataFrame to join on.

  • on (Optional[Union[List[ColumnInputType], ColumnInputType]], optional) – key or keys to join on [use if the keys on the left and right side match.]. Defaults to None.

  • left_on (Optional[Union[List[ColumnInputType], ColumnInputType]], optional) – key or keys to join on left DataFrame.. Defaults to None.

  • right_on (Optional[Union[List[ColumnInputType], ColumnInputType]], optional) – key or keys to join on right DataFrame. Defaults to None.

  • how (str, optional) – what type of join to performing, currently only inner is supported. Defaults to “inner”.

  • strategy (Optional[str]) – The join strategy (algorithm) to use; currently “hash”, “sort_merge”, “broadcast”, and None are supported, where None chooses the join strategy automatically during query optimization. The default is None.

  • ValueError – if on is passed in and left_on or right_on is not None.

  • ValueError – if on is None but both left_on and right_on are not defined.


Joined DataFrame.

Return type:
