daft.Expression.str.split

daft.Expression.str.split#

Expression.str.split(pattern: str | daft.expressions.expressions.Expression, regex: bool = False) Expression[source]#

Splits each string on the given literal or regex pattern, into a list of strings.

Example

>>> import daft
>>> df = daft.from_pydict({"data": ["daft.distributed.query", "a.b.c", "1.2.3"]})
>>> df.with_column("split", df["data"].str.split(".")).collect()
╭────────────────────────┬────────────────────────────╮
│ data                   ┆ split                      │
│ ---                    ┆ ---                        │
│ Utf8                   ┆ List[Utf8]                 │
╞════════════════════════╪════════════════════════════╡
│ daft.distributed.query ┆ [daft, distributed, query] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a.b.c                  ┆ [a, b, c]                  │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1.2.3                  ┆ [1, 2, 3]                  │
╰────────────────────────┴────────────────────────────╯

(Showing first 3 of 3 rows)

Split on a regex pattern

>>> import daft
>>> df = daft.from_pydict({"data": ["daft.distributed...query", "a.....b.c", "1.2...3.."]})
>>> df.with_column("split", df["data"].str.split(r"\.+", regex=True)).collect()
╭──────────────────────────┬────────────────────────────╮
│ data                     ┆ split                      │
│ ---                      ┆ ---                        │
│ Utf8                     ┆ List[Utf8]                 │
╞══════════════════════════╪════════════════════════════╡
│ daft.distributed...query ┆ [daft, distributed, query] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a.....b.c                ┆ [a, b, c]                  │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1.2...3..                ┆ [1, 2, 3, ]                │
╰──────────────────────────┴────────────────────────────╯

(Showing first 3 of 3 rows)
Parameters:
  • pattern – The pattern on which each string should be split, or a column to pick such patterns from.

  • regex – Whether the pattern is a regular expression. Defaults to False.

Returns:

A List[Utf8] expression containing the string splits for each string in the column.

Return type:

Expression