daft.Expression.str.split#
- Expression.str.split(pattern: str | daft.expressions.expressions.Expression, regex: bool = False) Expression [source]#
Splits each string on the given literal or regex pattern, into a list of strings.
Example
>>> df = daft.from_pydict({"data": ["foo.bar.baz", "a.b.c", "1.2.3"]}) >>> df.with_column("split", df["data"].str.split(".")).collect() ╭─────────────┬─────────────────╮ │ data ┆ split │ │ --- ┆ --- │ │ Utf8 ┆ List[Utf8] │ ╞═════════════╪═════════════════╡ │ foo.bar.baz ┆ [foo, bar, baz] │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ a.b.c ┆ [a, b, c] │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ 1.2.3 ┆ [1, 2, 3] │ ╰─────────────┴─────────────────╯
Split on a regex pattern
>>> df = daft.from_pydict({"data": ["foo.bar...baz", "a.....b.c", "1.2...3.."]}) >>> df.with_column("split", df["data"].str.split(r"\.+", regex=True)).collect() ╭───────────────┬─────────────────╮ │ data ┆ split │ │ --- ┆ --- │ │ Utf8 ┆ List[Utf8] │ ╞═══════════════╪═════════════════╡ │ foo.bar...baz ┆ [foo, bar, baz] │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ a.....b.c ┆ [a, b, c] │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ 1.2...3.. ┆ [1, 2, 3, ] │ ╰───────────────┴─────────────────╯
- Parameters:
pattern – The pattern on which each string should be split, or a column to pick such patterns from.
regex – Whether the pattern is a regular expression. Defaults to False.
- Returns:
A List[Utf8] expression containing the string splits for each string in the column.
- Return type:
Expression