Gem: exploding string alternatives

Tuesday 28 December 2021

Here’s a Python gem: a small bit of Python that uses the power of the language and standard library well.

It’s a function to list strings generated by a pattern with embedded alternatives. It takes an input string with brace-wrapped possibilities, and generates all the strings made from making choices among them:

>>> list(explode("{Alice,Bob} ate a {banana,donut}."))
[
    'Alice ate a banana.',
    'Alice ate a donut.',
    'Bob ate a banana.',
    'Bob ate a donut.'
]

Here’s the function:

def explode(pattern: str) -> Iterable[str]:
    """
    Expand the brace-delimited possibilities in a string.
    """
    seg_choices = []
    for segment in re.split(r"(\{.*?\})", pattern):
        if segment.startswith("{"):
            seg_choices.append(segment.strip("{}").split(","))
        else:
            seg_choices.append([segment])

    for parts in itertools.product(*seg_choices):
        yield "".join(parts)

I call this a gem because it’s concise without being tricky, and uses Python’s tools to strong effect. Let’s look at how it works.

re.split: The first step is to break the string into pieces. I used re.split(): it takes a regular expression, divides the string wherever the pattern matches, and returns a list of parts.

A subtlety that I make use of here: if the splitting regex has capturing groups (parenthesized pieces), then those captured strings are also included in the result list. The pattern is anything enclosed in curly braces, and I’ve parenthesized the whole thing so that I’ll get those bracketed pieces in the split list.

For our sample string, re.split will return these segments:

['', '{Alice,Bob}', ' ate a ', '{banana,donut}', '.']

There’s an initial empty string which might seem concerning, but it won’t be a problem.

Grouping: I used that list of segments to make another list, the choices for each segment. If a segment starts with a brace, then I strip off the braces and split on commas to get a list of the alternatives. The segments that don’t start with a brace are the non-choices part of the string, so I add them as a one-choice list. This gives us a uniform list of lists, where the inner lists are the choices for each segment of the result.

For our sample string, this is the parts list:

[[''], ['Alice', 'Bob'], [' ate a '], ['banana', 'donut'], ['.']]

itertools.product: To generate all the combinations, I used itertools.product(), which does much of the heavy lifting of this function. It takes a number of iterables as arguments and generates all the different combinations of choosing one element from each. My seg_choices list is exactly the list of arguments needed for itertools.product, and I can apply it with the star syntax.

The values from itertools.product are tuples of the choices made. The last step is to join them together and use yield to provide the value to the caller.

Nice.

Comments

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.