daft.Expression.str.extract#
- Expression.str.extract(pattern: str | daft.expressions.expressions.Expression, index: int = 0) Expression [source]#
Extracts the specified match group from the first regex match in each string in a string column.
Notes
If index is 0, the entire match is returned. If the pattern does not match or the group does not exist, a null value is returned.
Example
>>> regex = r"(\d)(\d*)" >>> df = daft.from_pydict({"x": ["123-456", "789-012", "345-678"]}) >>> df.with_column("match", df["x"].str.extract(regex)) ╭─────────┬─────────╮ │ x ┆ match │ │ --- ┆ --- │ │ Utf8 ┆ Utf8 │ ╞═════════╪═════════╡ │ 123-456 ┆ 123 │ ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤ │ 789-012 ┆ 789 │ ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤ │ 345-678 ┆ 345 │ ╰─────────┴─────────╯
Extract the first capture group
>>> df.with_column("match", df["x"].str.extract(regex, 1)).collect() ╭─────────┬─────────╮ │ x ┆ match │ │ --- ┆ --- │ │ Utf8 ┆ Utf8 │ ╞═════════╪═════════╡ │ 123-456 ┆ 1 │ ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤ │ 789-012 ┆ 7 │ ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤ │ 345-678 ┆ 3 │ ╰─────────┴─────────╯
- Parameters:
pattern – The regex pattern to extract
index – The index of the regex match group to extract
- Returns:
a String expression with the extracted regex match
- Return type:
Expression
See also