Friday 8 December 2017 — This is nearly eight years old. Be careful.
It’s December, which means Advent of Code is running again. It provides a new two-part puzzle every day until Christmas. They are a lot of fun, and usually are algorithmic in nature.
One of the things I like about the puzzles is they often lend themselves to writing unusual but general-purpose helpers. As I have said before, abstraction of iteration is a powerful and under-used feature of Python, so I enjoy exploring it when the opportunity arises.
For yesterday’s puzzle I needed to find the one unusual value in an otherwise uniform list. This is the kind of thing that might be in itertools if itertools had about ten times more functions than it does now. Here was my definition of the needed function:
def oddity(iterable, key=None):
"""
Find the element that is different.
The iterable has at most one element different than the others. If a
`key` function is provided, it is a function used to extract a comparison
key from each element, otherwise the elements themselves are compared.
Two values are returned: the common comparison key, and the different
element.
If all of the elements are equal, then the returned different element is
None. If there is more than one different element, an error is raised.
"""
The challenge I set for myself was to implement this function in as general and useful a way as possible. The iterable might not be a list, it could be a generator, or some other iterable. There are edge cases to consider, like if there are more than two different values.
If you want to take a look, My code is on GitHub (with tests, natch.) Fair warning: that repo has my solutions to all of the Advent of Code problems so far this year.
One problem with my implementation: it stores all the values from the iterable. For the actual Advent of Code puzzle, that was fine, it only had to deal with less than 10 values. But how would you change the code so that it didn’t store them all?
My code also assumes that the comparison values are hashable. What if you didn’t want to require that?
Suppose the iterable could be infinite? This changes the definition somewhat. You can’t detect the case of all the values being the same, since there’s no such thing as “all” the values. And you can’t detect having more than two distinct values, since you’d have to read values forever on the possibility that it might happen. How would you change the code to handle infinite iterables?
These are the kind of considerations you have to take into account to write truly general-purpose itertools functions. It’s an interesting programming exercise to work through each version would differ.
BTW: it might be that there is a way to implement my oddity function with a clever combination of things already in itertools. If so, let me know!
Comments
def oddity(iterable, key=None): it = iter(iterable) if key is None: key = lambda v: v different_value = None different_key = None different_found = False common_value = next(it) common_key = key(common_value) common_repeated = False for v in it: k = key(v) if k != common_key: if not different_found: different_value = v different_key = k different_found = True continue if common_repeated or k != different_key: raise ValueError common_value, different_value = different_value, common_value common_key, different_key = different_key, common_key common_repeated = True return common_key, different_valueMaybe this would work?
def oddity(iterable, key=None): key = key or (lambda x: x) it = iter(iterable) try: a, b, c = next(it), next(it), next(it) u, v, w = key(a), key(b), key(c) while u == v and v == w: a, b, c = b, c, next(it) u, v, w = v, w, key(c) if u == v: return c return a if v == w else b except StopIteration: return NoneOne downside which comes to my mind is that the function traverses the function too slowly, causing most pairs of values to be compared twice since the 'sliding window' advances one element at a time. This could be fixed by using e.g. itertools.islice to advance two elements at a time, at the expense of having to write extra code which deals with iterables of inconvenient lengths (3*n + 1 elements).The last sentence however limits the implementation somewhat: "If there is more than one different element, an error is raised.". This means that the function always has to traverse the entire sequence at once - which may be a long time, in case it's an infinite sequence.
Ironically, for day 7 I used collections.Counter instead of itertools:
weights = [inclusive_weight(c) for c in children(d)] if len(set(weights)) > 1: bad, good = (t[0] for t in sorted(Counter(weights).items(), key=lambda t:t[1])) ...def oddity(iterable, key=None): # type: (Iterable[T], Optional[Callable[T, K]) -> Tuple[Optional[T], Optional[T]] """find the oddball value iterable: an iterable of a type implementing equality, or mappable to a type implementing equality via key key: if provided, a function mapping iterable's values to comparable values Returns: tuple of two values where the first (or its key-mapped counterpart) appears exactly once in iterable and the second is the value that appears in all other positions in interable. If there is no value meeting one of those criteria, the tuple slot is None. """ def zipper(): for item in iterable: yield (item, item if (key is None) else key(item)) t1 = t2 = t3 = k1 = k2 = k3 = None items = zipper() sentinal = object() try: t1, k1 = items.next() t2, k2 = items.next() t3, k3 = items.next() except StopIteration: return (None, t1) if (k1 == k2) else (t1, t2) if k1 == k2: t_norm, k_norm = t1, k1 t_odd, k_odd = (sentinal, sentinal) if (k3 == k_norm) else (t3, k3) elif k1 == k3: t_norm, k_norm = t1, k1 t_odd, k_odd = t2, k2 elif k2 == k3: t_norm, k_norm = t2, k2 t_odd, k_odd = t1, k1 else: raise Exception("more than one oddity!") for t, k in items: if k == k_norm: continue if k_odd is not sentinal: raise Exception("more than one oddity!") t_odd, k_odd = t, k return (t_odd if (t_odd is not sentinal) else None, t_norm if (t_norm is not sentinal) else None)@en zyme: you are probably right that specifying an exception is overkill. In fact, when I was planning this blog post, I meant to talk about the value of leaving behavior undefined, and I forgot to!
@Serhiy, @Nick, @Oliver: thanks for your implementations. It's fascinating to see the variety of approaches :)
When iterable is finite, the break inside the loop isn't necessary for correctness. It terminates early once three different keys have been seen, since nothing after that can restore the conditions for a non-exceptional return.
def oddity(iterable, key=None): if key is None: key = lambda v: v once = {} # {key: value} of keys seen only once dups = [] # keys seen more than once for v in iterable: k = key(v) if k in once: del once[k] dups.append(k) elif k not in dups: once[k] = v if len(once) + len(dups) > 2: break if not dups: raise ValueError("No duplicated values") if len(dups) > 1: raise ValueError("Too many duplicated values") if len(once) > 1: raise ValueError("Too many distinct values") return dups[0], list(once.values() or (None,))[0]def oddity(iterable, key=None): key = key or (lambda x: x) try: return next( ((a,key(b)) if key(b)==key(c) else (b,key(a)) if key(a)==key(c) else (c, key(a))) for a,b,c in zip_longest(*[chain(*tee(iterable,3))]*3) if not (key(a) == key(b) == key(c))) except StopIteration: return NoneIt works for infinite iterators, and return None when there's no oddity. For an infinite iterator with no oddity it runs forever. It doesn't work if the iterator has more than one oddity (but it's not really compatible with infinite iterators anyway).Add a comment: