daft.Expression.str.split

daft.Expression.str.split#

Expression.str.split(pattern: str | daft.expressions.expressions.Expression, regex: bool = False) Expression[source]#

Splits each string on the given literal or regex pattern, into a list of strings.

Example

>>> df = daft.from_pydict({"data": ["foo.bar.baz", "a.b.c", "1.2.3"]})
>>> df.with_column("split", df["data"].str.split(".")).collect()
╭─────────────┬─────────────────╮
│ data        ┆ split           │
│ ---         ┆ ---             │
│ Utf8        ┆ List[Utf8]      │
╞═════════════╪═════════════════╡
│ foo.bar.baz ┆ [foo, bar, baz] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a.b.c       ┆ [a, b, c]       │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1.2.3       ┆ [1, 2, 3]       │
╰─────────────┴─────────────────╯

Split on a regex pattern

>>> df = daft.from_pydict({"data": ["foo.bar...baz", "a.....b.c", "1.2...3.."]})
>>> df.with_column("split", df["data"].str.split(r"\.+", regex=True)).collect()
╭───────────────┬─────────────────╮
│ data          ┆ split           │
│ ---           ┆ ---             │
│ Utf8          ┆ List[Utf8]      │
╞═══════════════╪═════════════════╡
│ foo.bar...baz ┆ [foo, bar, baz] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a.....b.c     ┆ [a, b, c]       │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1.2...3..     ┆ [1, 2, 3, ]     │
╰───────────────┴─────────────────╯
Parameters:
  • pattern – The pattern on which each string should be split, or a column to pick such patterns from.

  • regex – Whether the pattern is a regular expression. Defaults to False.

Returns:

A List[Utf8] expression containing the string splits for each string in the column.

Return type:

Expression