daft.Expression.str.split#
- Expression.str.split(pattern: str | daft.expressions.expressions.Expression, regex: bool = False) Expression [source]#
Splits each string on the given literal or regex pattern, into a list of strings.
Example
>>> import daft >>> df = daft.from_pydict({"data": ["daft.distributed.query", "a.b.c", "1.2.3"]}) >>> df.with_column("split", df["data"].str.split(".")).collect() ╭────────────────────────┬────────────────────────────╮ │ data ┆ split │ │ --- ┆ --- │ │ Utf8 ┆ List[Utf8] │ ╞════════════════════════╪════════════════════════════╡ │ daft.distributed.query ┆ [daft, distributed, query] │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ a.b.c ┆ [a, b, c] │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ 1.2.3 ┆ [1, 2, 3] │ ╰────────────────────────┴────────────────────────────╯ (Showing first 3 of 3 rows)
Split on a regex pattern
>>> import daft >>> df = daft.from_pydict({"data": ["daft.distributed...query", "a.....b.c", "1.2...3.."]}) >>> df.with_column("split", df["data"].str.split(r"\.+", regex=True)).collect() ╭──────────────────────────┬────────────────────────────╮ │ data ┆ split │ │ --- ┆ --- │ │ Utf8 ┆ List[Utf8] │ ╞══════════════════════════╪════════════════════════════╡ │ daft.distributed...query ┆ [daft, distributed, query] │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ a.....b.c ┆ [a, b, c] │ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ │ 1.2...3.. ┆ [1, 2, 3, ] │ ╰──────────────────────────┴────────────────────────────╯ (Showing first 3 of 3 rows)
- Parameters:
pattern – The pattern on which each string should be split, or a column to pick such patterns from.
regex – Whether the pattern is a regular expression. Defaults to False.
- Returns:
A List[Utf8] expression containing the string splits for each string in the column.
- Return type:
Expression