daft.Expression.str.extract#
- Expression.str.extract(pattern: str | daft.expressions.expressions.Expression, index: int = 0) Expression [source]#
Extracts the specified match group from the first regex match in each string in a string column.
Notes
If index is 0, the entire match is returned. If the pattern does not match or the group does not exist, a null value is returned.
Example
>>> import daft >>> regex = r"(\d)(\d*)" >>> df = daft.from_pydict({"x": ["123-456", "789-012", "345-678"]}) >>> df.with_column("match", df["x"].str.extract(regex)).collect() ╭─────────┬───────╮ │ x ┆ match │ │ --- ┆ --- │ │ Utf8 ┆ Utf8 │ ╞═════════╪═══════╡ │ 123-456 ┆ 123 │ ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤ │ 789-012 ┆ 789 │ ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤ │ 345-678 ┆ 345 │ ╰─────────┴───────╯ (Showing first 3 of 3 rows)
Extract the first capture group
>>> df.with_column("match", df["x"].str.extract(regex, 1)).collect() ╭─────────┬───────╮ │ x ┆ match │ │ --- ┆ --- │ │ Utf8 ┆ Utf8 │ ╞═════════╪═══════╡ │ 123-456 ┆ 1 │ ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤ │ 789-012 ┆ 7 │ ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤ │ 345-678 ┆ 3 │ ╰─────────┴───────╯ (Showing first 3 of 3 rows)
- Parameters:
pattern – The regex pattern to extract
index – The index of the regex match group to extract
- Returns:
a String expression with the extracted regex match
- Return type:
Expression
See also