Generator comprehensions

Wednesday 11 May 2016This is close to nine years old. Be careful.

Python has a compact syntax for constructing a list with a loop and a condition, called a list comprehension:

my_list = [ f(x) for x in sequence if cond(x) ]

You can also build dictionaries with dictionary comprehensions, and sets with set comprehensions:

my_dict = { k(x): v(x) for x in sequence if cond(x) }
my_set = { f(x) for x in sequence if cond(x) }

(The syntax allows more complexity than these examples, let’s not get distracted!)

Finally, you can make a generator with similar syntax:

my_generator = ( f(x) for x in sequence if cond(x) )

Unfortunately, this is called a generator expression, not a generator comprehension. Why not? If the first three are all comprehensions, why isn’t this a comprehension?

PEP 289, Generator Expressions has detailed notes at the end which point out that Raymond Hettinger originally proposed “generator comprehensions,” that they were then resurrected by Peter Norvig as “accumulation displays,” and that Tim Peters suggested the name “generator expressions.” It does not explain why the names changed along the way.

I made a query on Twitter:

OK, #python question I don’t know the answer to: why are they called “generator expressions” and not “generator comprehensions”?

Guido’s reply gets at the heart of the matter:

Originally comprehension was part of the “literal display” notion. GenExprs are not displays.

Matt Boehm found the email where Tim Peters proposed “generator expression” that also has some details.

After reading that, I understand more. First, what’s with the word “comprehension”? As Tim pointed out, the word comes from set theory’s Axiom of Comprehension, which talks about sets formed by applying a predicate (condition) to elements of another set. This is very similar to lists formed by applying a condition to elements of another sequence.

As Guido’s tweet points out, and the subject line of the email thread makes clear (“accumulator display syntax”), the designers at the time were thinking much more about displays than they were about conditions. The word “display” here means that the syntax for the code looks like the data structure it will create. A list display (list comprehension) looks like a list. Same for set and dictionary displays. But there is no generator literal syntax, so there’s nothing for a generator display to look like, so there are no generator displays.

In that original email thread designing the feature, the word “comprehension” became synonymous with “display”, and since generators couldn’t have displays, they also couldn’t have comprehensions.

But as Tim points out in his email, the interesting part of a comprehension is the condition. The heart of the Axiom of Comprehension is the predicate. Perhaps because the condition is optional in a Python comprehension, the focus shifted to the display aspect.

I think we should call them “generator comprehensions” again. We don’t use the term “display” for these things. There’s no reason to link “comprehension” to “display,” and literal syntax.

The four different expressions (list comprehension, dict comprehension, set comprehension, and generator expressions) have an awful lot in common with each other. It would be a great shorthand to be able to discuss their similarities by talking about “comprehensions” and having it cover all four. Their similarities are more than their differences, so let’s use the same word for all four.

Proposal: call them “generator comprehensions.”

Comments

[gravatar]
I'd never thought about why it was called a generator expression instead of a comprehension but I like your proposal :)

I did find some people referencing "generator comprehensions" in a SO post from 2008! [1] Guido himself refers to "generator comprehensions" in a blog post from 2010 too. [2]

[1] http://stackoverflow.com/questions/364802/generator-comprehension
[2] http://python-history.blogspot.ca/2010/06/from-list-comprehensions-to-generator.html
[gravatar]
+1 for 4 comprehensions

...or I suppose you could define a display for generators and qualify that way. ;-)
[gravatar]
SIWOTI! Well, not really, but there are few misleading things said here IMO.

First, I put my set-theoretic hat. "The heart of the Axiom of Comprehension is the predicate" - you should probably say "was". In Cantor's original set theory, there really was such a thing as a mapping from predicate to a set, and it was called comprehension. But that obviously lead to paradoxes (e.g. Russel: take a predicate of not being an element of itself) and was abandoned. Modern set theories (such as ZF) usually (NF is a counterexample) do restrict the comprehension only to elements of a previously existing set, just as Python does (only for arbitrary iterators instead of sets).

Second, I put my Python hat. There is no such thing as generator display, not because of an omission or because it was deemed unnecessary by BDFL, but because the thing is a contradiction in itself. The whole idea of a generator is that it _generates_ objects. When you call next on a generator (f(a) for a in S), you can (and usually do) get an object that didn't necessarily exist before. Precisely, you execute some code ("f(a)") and get its result, and you can say that the result was "generated" by the code, even if it refers to an already existing object, and even if the code is simply "look up a name".

On the other hand, the whole idea of displays is to construct a (potentially new) _container_ for _already existing_ contained objects. To make a display, Python has to first accumulate on the stack all the objects, and then BUILD_whatever. IOW, displays are data, generators are code. Python has always respected the fine distinction there.

So, I think it would be wrong to try to further unify generator expressions and comprehensions, just as it was subtly wrong for Guido to try to unify generator functions with ordinary functions (and he kinda admitted that recently, when coroutines got a distinct moniker in front of their "def").
[gravatar]
@Veky: First, even in the modern set theories, the predicate is the heart of a comprehension.

Second, it isn't that important *why* generators don't have displays. My point is that "comprehension" shouldn't be equated with "display". It should be equated with, "an expression to create a bunch of things from an existing bunch of things, possibly with a predicate."

It is certainly true that generator expressions have a great deal in common with the three comprehensions. I think talking about them, and understanding them, will be easier if we rename them to "generator comprehensions."
[gravatar]
I like this. I like this because though I've seen countless articles on comprehensions, covering set, list, and dict comprehensions, I have never seen them cover generator expressions. It seems to me list comprehensions and generator expressions should be discussed right next to each other.

I think that when we choose terms we are affecting how people learn and comprehend Python. By separating these ideas in terms we separate them within people's minds, prevent them from being discussed together by the average user, and prevent them from being considered together as solutions to the same problem; that's clearly incorrect in this case.
[gravatar]
Generator expressions are distinct from comprehensions for the primary reason that they are not evaluated until needed. You can think of comprehensions as distilled expressions.

My question is why didn't they expose dictonary expressions and set expressions ( lazy evaluation ) rather than just the comprehensions ( pre-evaluated )?
[gravatar]
@Eric: they four expressions are distinct, and have differences, sure. My point is that they are more similar than they are different, and having one term to cover those similarities would be very helpful.

As to why there are no lazily-evaluated dicts and sets, I don't see how they could work? It would produce a dict that has just one key/value pair, and then what?
[gravatar]
@Ned,

They are not similar on the backend and should be referred to differently. This is why knowing the difference between range and irange was so critical in earlier releases of python. One returns a block of code the others return datasets.

It does turn out you can use generators for dicts with a tuple, but it would nice to have a unified syntax.
>>> ( x*x for x in range(10) )
#generator object #genexpr> at 0x2abc9e848570>
>>> list( x*x for x in range(10) )
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> set( x*x for x in range(10) )
{0, 1, 64, 4, 36, 9, 16, 49, 81, 25}
>>> ( (x,x*x) for x in range(10) )
#generator object #genexpr> at 0x2abc9e848570>
>>> dict( (x,x*x) for x in range(10) )
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
[gravatar]
@Eric, ok, we can disagree about what similarities are worth giving common names to.

Your examples of using generators with dict() etc don't seem to get you what you wanted: the entire dict is created before the expression completes, just as it would be with the dict comprehension. Maybe I misunderstood what you were looking for.
[gravatar]
@Eric on second thought: "generator" is the word that underscores the lazy evaluation. You seem to want "comprehension" to mean eager evaluation, and "expression" to mean lazy evaluation. But "expression" is a broader term that encompasses all Python expressions, whether they are lazy or eager. Switching between "comprehension" and "expression" to denote eager vs lazy evaluation doesn't make any sense.

Python expressions in general are not lazy, so you have to look to the word "generator" to convey the laziness. Once you've done that, you might as well use "comprehension" to convey the compact looping-with-conditional syntax that generator comprehensions share with the list, dict, and set comprehensions.
[gravatar]
@Ned,

I see what you are saying. I just think the distinction is very important between eager and lazy evaluation and should be clear. I might be convinced to call them "comprehension generators" so a (list/dict/set) comprehension takes a comprehension generator, thoughts?
[gravatar]
I am not a language lawyer so I'll simply ask a question. Are there not several other "expressions" in Python that are also lazy but don't have a word in their name to convey that fact? IIRC there are.

It seems to me that the similarity of the syntax is more important here than noting the differences of each form with a different word. To someone not steeped in the details (a language learner perhaps), it seems easier to learn and remember the details of each later than try to remember each's name so you can look it up in an index. Oops, excuse me, today we google. ;-)

I still believe all four should be "comprehensions" and that one is for lists, another for sets, one is lazy, and this one over here is not. When you iterate over them, they each provide the next item.
[gravatar]
I'll stick with my original proposal: they are all comprehensions, of four different kinds, depending on what they make: list, dict, set, or generator.
[gravatar]
I still disagree about modern set theories, but that's subjective.

But I've been researching a bit. You have an IMO strong argument for your thesis you haven't mentioned at all: the Python's grammar specification. It has nonterminals (which govern productions of generator expressions too)
comp_iter: comp_for | comp_if
comp_for: 'for' exprlist 'in' or_test [comp_iter]
comp_if: 'if' test_nocond [comp_iter]
... whose names start with "comp_", which obviously refers to the "comprehensions". On the other hand, "displays" aren't mentioned at all, the closest thing (different from "comp") is "maker", in "dictorsetmaker".

So, it seems that at least when it comes to specifying the grammar, people _do_ put these concepts together. And when they need a simple term to name the joint concept, they use "comp". IMO, that's the strongest argument for your thesis.
[gravatar]
@Ned

You mention the "four" comprehensions, which you seem to associate the literal options, but I find consumption of generator expressions by builtins is more generalized:
    (x*x for x in range(10))
    list(x*x for x in range(10))           
    set(x*x for x in range(10))             
    dict((x, x*x) for x in range(10))
    tuple(x*x for x in range(10))
Indeed they do have an awful lot in common. Would you classify these "consumption options" as separate comprehensions as well? Notice, this is the only way to get a tuple w/o literals, and there are five options listed.
[gravatar]
@pylang Your other uses of generator comprehensions are still just generator comprehensions. You could have as easily put square brackets in all of those examples and used list comprehensions in all those places. f(2) and g(2) aren't different kinds of 2, just different uses of 2.
[gravatar]
@Ned
why there is no tuple comprehension ? generator expression looks like tuple comprehension
[gravatar]
@sachkh: there are no tuple comprehensions because tuples are meant to be used where you know the number of elements based on the data structure, and each element has a distinct meaning, like a C struct. That usage doesn't mesh with the comprehension structure, where the number of elements is unknown, and all elements are the same semantically.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.