Python Names and Values

This is a presentation I gave at PyCon 2015 in Montreal. You can read the slides and text on this page, or open the actual presentation in your browser (use right and left arrows to advance the slides). The figures in the presentation are animated, which you won’t see on this page, click through the presentation to see them in their full glory. You can also watch the video of me presenting it:

Watch the video on YouTube

Also, this is a re-working of an earlier piece with the same name: Facts and myths about Python names and values. It covers the same ideas, but with different text and figures.

Facts and Myths about Names and Values

This talk is about some fundamental concepts in Python: names and values.

These lessons apply to both Python 2 and Python 3.

Python is simple

Python is a very approachable language. Often it works just as you expect if you come to it from other languages. But you might suddenly encounter surprising behavior.

The underlying mechanisms of Python are often quite simple, but their combined effects might not be what you expect. By understanding the mechanisms, you can reason about their effects.

Today I’ll be talking about names, values, assignment, and mutability.

Names refer to values

As in many programming languages, a Python assignment statement associates a symbolic name on the left-hand side with a value on the right-hand side. In Python, we say that names refer to values, or a name is a reference to a value:

x = 23

Now the name “x” refers to the value 23. The next time we use the name x, we’ll get the value 23:

print(x)        # prints 23

Another way people describe this is, x is bound to 23.

Exactly how the name refers to the value isn’t really important. If you’re experienced with the C language, you might like to think of it as a pointer, but if that means nothing to you then don’t worry about it.

To help explain what’s going on, I’ll use diagrams. A gray rectangular tag-like shape is a name, with an arrow pointing to its value. The slide shows the name x referring to an integer 23.

Many names can refer to one value

There’s no rule that says a value can only have one name. An assignment statement can make a second (or third, ...) name refer to the same value.

x = 23
y = x

Now x and y both refer to the same value

Neither x or y is the “real” name. They have equal status: each refers to the value in exactly the same way.

Names are reassigned independently

If two names refer to the same value, this doesn’t magically link the two names. Reassigning one of them won’t reassign the other also:

x = 23
y = x
x = 12

When we said “y = x”, that doesn’t mean that they will always be the same forever. Reassigning x leaves y alone. Imagine the chaos if it didn’t!

Also important to note is that Python has no mechanism for making a name refer to another name. That is, there is no way to make y be a permanent alias for x no matter how x is reassigned.

Values live until no references

Python keeps track of how many references each value has, and automatically cleans up values that have none. This is called “garbage collection,” and means that you don’t have to get rid of values, they go away by themselves when they are no longer needed.

Exactly how Python keeps track is an implementation detail, but if you hear the term “reference counting,” that’s an important part of it. Sometimes cleaning up a value is called reclaiming it.

Assignment never copies data

An important fact about assignment: assignment never copies data.

When values have more than one name, it’s easy to get confused and think of it as two names and two values:

x = 23
y = x
# "Now I have two values: x and y!"
# NO: you have two names, but only one value.

Assigning a value to a name never copies the data, it never makes a new value. Assignment just makes the name on the left refer to the value on the right. In this case, we have only one 23, and x and y both refer to it, just as we saw in the last diagrams.

Things get more interesting when we have more complicated values, like a list:

nums = [1, 2, 3]

Now if we assign nums to another name, we’ll have two names referring to the same list:

nums = [1, 2, 3]
other = nums

Remember: assignment never makes new values, and it never copies data. This assignment statement doesn’t magically turn my list into two lists.

At this point, we have one list, referred to by two names, which can lead to a big surprise.

Changes are visible through all names

Here we create our list again, and assign it to both nums and other. Then we append another value to nums. When we print other, it has the 4 appended to it. This surprises many people.

Because there was only ever one list, both names see the change. The assignment to other didn’t make a copy of the list, and the append operation didn’t copy the list before appending the value. There is only one list, and if you modify it through one of its names, the change will be visible through all of the names.

Mutable aliasing!

This is an important enough effect, and one that is surprising to enough people, that it bears repeating. If a mutable value has more than one name, and the value changes, then all names see the change.

In the small code samples we’ve been looking at, it’s easy to see where there are two names for one list. But as we’ll see with function calls, the two names could be quite far apart, and the change in value very far from the surprise discovery.

I’ve used the word “mutable” here for the first time. It means a value that can change in place. In our code sample, the name nums refers to the same object through all four lines of code. But the value that object contains has changed.

Immutable values can't alias

Not all values in Python are mutable. Numbers, strings, and tuples are all immutable. There is no operation on any of these values that can change them in place. All you can do is make new object from old objects.

In our three line code sample here, x refers to the string “hello”. Then y also refers to that string. In the last line, the + operator doesn’t extend the existing string, it creates an entirely new string by concatenating “hello” and ” there”. Then x refers to that new string.

The old string is unaffected. This is guaranteed with strings because they are immutable. No method on a string modifies the string, they all return new strings.

The aliasing problem can’t happen with immutable values because one of the conditions is impossible: you can’t change the value in place.

"Change" is unclear

One of the difficulties in talking about these concepts is the word “change”. Informally, we say that adding 1 to x “changes” x:

x = x + 1

We also say that that appending to num “changes” num:

num.append(7)       # changes num

But these are two very different operations. The first is rebinding x. x+1 creates an entirely new object, and then the assignment statement makes x refer to it. Appending to num is mutating num. The name num still refers to the same object, but that object has been modified, its value has been updated in-place.

The words “rebinding” and “mutating” are too awkward to use all the time, but when we’re doing a close reading of a piece of code to understand how it works, they can be very useful for distinguishing between the two different kinds of change.

Of course, you can also rebind a name that refers to a list. This is another way to have nums become a list with a 7 on the end:

nums = nums + [7]

As with integers, the + operator here makes an entirely new list, and then the name nums is rebound to it.

On the other hand, there is no way to mutate a number. They are immutable, literally they cannot be mutated.

Mutable and immutable are assigned the same

One of the common misconceptions about Python is that assignment works differently for mutable values than for immutable values. This is not true. Assignment is very simple: it makes the name on the left side refer to the value on the right side.

When people say that assignment works differently, they are misdiagnosing the mutable aliasing problem. They see two different effects in two pieces of code, and know that one involves a mutable value, and the other an immutable value, and mistakenly believe that it’s the assignment step that differs between the two pieces of code.

Python’s underlying mechanisms are often much simpler than people give them credit for. Assignment always does exactly the same thing, regardless of the value on the right-hand side.

Assignment variants

Python provides other kinds of assignment than the simple equals sign. For example, both numbers and lists offer += .

Conceptually, these two lines are the same:

x += y
x = x + y

The way += works in Python, though, it is implemented by the value x. These two lines are actually the same:

x += y
x = x.__iadd__(y)

(to be completely pedantic, it’s the same as “x = type(x).__iadd__(x, y)”, because the method will not be found on the object itself, only on the class.) The meaning of += depends on the type of x, because that value provides the implementation of __iadd__ that will be used.

For numbers, += works just as you’d expect. But lists give us another surprise. With lists, “nums = nums+more” will rebind nums to a new list formed by concatenating nums and more. But “nums += more” actually modifies nums in-place, as a mutating operation.

The reason is that list implements __iadd__ like this (except in C, not Python):

class List:
    def __iadd__(self, other):
        self.extend(other)
        return self

When you execute “nums += more”, you’re getting the same effect as:

nums = nums.__iadd__(more)

which, because of the implementation of __iadd__, acts like this:

nums.extend(more)
nums = nums

So there is a rebinding operation here, but first, there’s a mutating operation, and the rebinding operation is a no-op.

The moral of the story here is to understand the behavior of the primitives you are using!

References can be more than just names

All of the examples I’ve been using so far used names as references to values, but other things can be references. Python has a number of compound data structures each of which hold references to values: list elements, dictionary keys and values, object attributes, and so on. Each of those can be used on the left-hand side of an assignment, and all the details I’ve been talking about apply to them. Anything that can appear on the left-hand side of an assignment statement is a reference, and everywhere I say “name” you can substitute “reference”.

In the diagrams of lists so far, I’ve shown numbers stored in the boxes, but really, each element is a reference, so they should really be drawn like this, with arrows from the boxes to values outside of them:

nums refers to a list, which refers to intsnums123nums = [1, 2, 3]

But that gets complicated quickly, so I’ve used a visual shorthand, with the numbers in the boxes themselves.

nums refers to a list of numbersnums123nums = [1, 2, 3]
Lots of things are references

Here are some other assignments. Each of these left-hand sides is a reference:

my_obj.attr = 23
my_dict[key] = 24
my_list[index] = 25
my_obj.attr[key][index].attr = "etc, etc"

and so on. Lots of Python data structures hold values, and each of those is a reference. All of the rules here about names apply exactly the same to any of these references. For example, the garbage collector doesn’t just count names, it counts any kind of reference to decide when a value can be reclaimed.

Note that “i = x” assigns to the name i, but “i[0] = x” does not assign to the name i. It assigns to the first element of i’s value. It’s important to keep straight what exactly is being assigned to. Just because a name appears somewhere on the left-hand side of the assignment statement doesn’t mean the name is being rebound.

Lots of things are assignments

Just as many things can serve as references, there are many operations in Python that are assignments. Each of these lines is an assignment to the name X:

X = ...
for X in ...
[... for X in ...]
(... for X in ...)
{... for X in ...}
class X(...):
def X(...):
def fn(X): ... ; fn(12)
with ... as X:
except ... as X:
import X
from ... import X
import ... as X
from ... import ... as X

I don’t mean that these statements act kind of like assignments. I mean that these are assignments: they all make the name X refer to a value, and everything I’ve been saying about assignments applies to all of them uniformly.

For the most part, these statements define X in the same scope as the statement, but not all of them, especially the comprehensions, and the details differ slightly between Python 2 and Python 3. But they are all real assignments, and every fact about assignment applies to all of them.

For loops

For-loops are an interesting example. When you write code like this:

for x in sequence:
    something(x)

it executes kind of like this:

x = sequence[0]
something(x)
x = sequence[1]
something(x)
# and so on...

The actual mechanics of getting values from the sequence is more involved than simple indexing like I’ve shown here. But the point is that each element in the sequence is assigned to x just as if it had been done with a simple assignment statement. And again, all the rules about assignments and how they work apply to this assignment.

For loops

Let’s say we have a list of numbers, and we want to multiply them all by 10, so if we started with [1, 2, 3], we want to modify the list to become [10, 20, 30]. A simple approach might be this:

nums = [1, 2, 3]
for x in nums:          # x = nums[0] ...
    x = x * 10
print(nums)             # [1, 2, 3]   :(

but it doesn’t work. To see why, remember that on the first iteration, x is another name for nums[0]. As we learned earlier, when you have two names referring to one value, and you reassign one of the names, the other name doesn’t also change. In this case, we reassign x (with “x = x * 10”), so x is now 10, but nums[0] still refers to the old value, 1.

Our loop never modifies the original list because we are simply reassigning the name x over and over again.

The best advice is to avoid mutating lists, and instead to make new lists:

nums = [ 10*x for x in nums ]
Function arguments are assignments

Function arguments are perhaps the most important thing that doesn’t look like an assignment, but is. When you define a function, you specify its formal parameters, as x is here:

def func(x):
    print(x)

When you call a function, you provide actual argument values:

num = 17
func(num)
print(num)

Here num is the value supplied for the parameter x. When the function is called, we get the exact same behavior as if we had executed “x = num”. The actual value is assigned to the parameter.

Each function call creates a stack frame, which is a container for the names local to that function. The name x is local to the function, but the assignment semantics are the same.

When we are inside func, there is a value (17) with two names: num in the calling frame, and x in the function’s frame.

When the function returns, the function’s stack frame is destroyed, which destroys the names it contains, which may or may not reclaim values if they were the last names referring to the values.

untitled

Let’s try to write a useful function. We want to append a value twice to a list. We’ll write it three different ways. Two of them work (but differently), and one is completely useless. It will be instructive to understand why the three versions behave differently.

Here is our first version of append_twice:

def append_twice(a_list, val):
    a_list.append(val)
    a_list.append(val)

This is very simple, and does exactly what it claims to do.

We call it like this:

nums = [1, 2, 3]
append_twice(nums, 7)
print(nums)         # [1, 2, 3, 7, 7]

When we call append_twice, we pass it nums. This assigns nums to the parameter a_list, so now there are two names for the list, nums in the caller, and a_list in the append_twice function. Then we append val onto the list twice. These appends work on a_list, which is the same list as nums in the caller, so we are directly modifying the caller’s list.

When the function ends, the frame is destroyed, which removes the local name a_list. Since that wasn’t the only reference to the list, removing the name a_list doesn’t reclaim the list itself.

Back in the caller, we print the nums list, and see that it has indeed been modified.

untitled

Now let’s try a different implementation:

def append_twice_bad(a_list, val):
    a_list = a_list + [val, val]
    return

nums = [1, 2, 3]
append_twice_bad(nums, 7)
print(nums)         # [1, 2, 3]

Here we take another approach: inside the function, we extend the list by adding two values to the original list. But this function doesn’t work at all. As before, we pass nums into the function, so we have both a_list and nums referring to the original list. But then we create an entirely new list, and assign it to a_list.

When the function ends, the frame is destroyed, and the name a_list along with it. The new augemented list was only referred to by a_list, so it is also reclaimed. All of our work is lost!

Back in the caller, we see no effect at all, because the original list was never modified. The impulse here was good: functions that modify values passed into them are good ways to get unexpected aliasing bugs. Making new lists can avoid these problems. Luckily, there’s an easy thing we can do to make this function useful.

untitled

Here we have nearly the same function, but once we’ve made our new a_list, we return it:

def append_twice_good(a_list, val):
    a_list = a_list + [val, val]
    return a_list

nums = [1, 2, 3]
nums = append_twice_good(nums, 7)
print(nums)         # [1, 2, 3, 7, 7]

Then in the caller, we don’t simply call append_twice_good, we assign its return value to nums again. The function has made a new list, and by returning it, we can decide what to do with it. In this case, we want nums to be the new list, so we simply assign it to nums. The old value is reclaimed, and the new value remains.

Three append_twice functions

Here are the three versions of the function, for comparison. The first function mutates the list in-place. The second version makes a new list but fails to do anything useful with it. The third version makes a new list, and returns it so that we can use it.

Any name → any value @ any time

Python is dynamically typed, which means that names have no type. Any name can refer to any value at any time. A name can refer to an integer, and then to a string, and then to a function, and then to a module. Of course, this could be a very confusing program, and you shouldn’t do it, but the Python language won’t mind.

Names have no type

Just as names have no type, values have no scope. When we say that a function has a local variable, we mean that the name is scoped to the function: you can’t use the name outside the function, and when the function returns, the name is destroyed. But as we’ve seen, if the name’s value has other references, it will live on beyond the function call. It is a local name, not a local value.

Values can be created, used, and ultimately destroyed all up and down the call stack. There is no rule about when values can be created, and how long they can live.

Other topics

Here are a number of quick topics that I didn’t have time to cover in depth.

“Python has no variables” In all honesty, the original reason I started writing this talk was because I would see people try to explain how Python works by saying that Python has no variables. This is obviously wrong. What they mean is that Python variables work differently than variables in C, though even that statement is hard to understand.

At one time, teaching Python pretty much meant teaching it to people who knew C. That need is dwindling over time. But even if we are dealing with many C refugees, there’s no reason to let C own the word “variable”, leaving Python to come up with some other word. It’s OK for two languages to use the same word in slightly different ways.

Thankfully, this mantra is falling out of favor!

Call by value or call by reference? People also get tangled up trying to explain whether Python uses call by value, or call by reference. The answer is: neither. This is a false dichotomy, again because people are trying to mis-apply concepts from one language onto all languages.

Python uses a calling convention a little different from either pure call by value or pure call by reference. Some people call it call by object, or call by assignment.

Making a 2D list: the simple way to make a nested list (let’s say an 8x8 board for checkers) doesn’t work. You get a list of eight zeros, but the outer list has eight references to that single list, rather than eight separate lists of zeros. The second form gives you the correct structure. You can read more about that in this blog post: Names and values: making a game board.

pythontutor.com is a site that lets you enter your own Python code, and it will diagram and animate its behavior. The diagrams aren’t as pretty as mine, but it works on your code!

Questions?

Thanks for your attention, I hope this has been helpful.

Python’s mechanisms are very simple, often simpler than people believe. Understand them, and your programs will not surprise you in bad ways!

Comments

[gravatar]
Hey Ned, thank you for a solid and clarifying Python walk-through today.
[gravatar]
Thanks for your talk Ned! ;)
[gravatar]
Dean Jay Mathew 1:20 AM on 13 Apr 2015
I learned much from this talk, plus it got me through the first stretch of the traffic laden Jakarta commute. Will be great to catch up RE our OPENedX instance soon as we are getting closer to launch. Dean
[gravatar]
Great talk, thanks!

Very glad to learn about that last bit about the keyword arguments - I'm sure that would have eventually caused me some debugging pain :)
[gravatar]
Hi, Ned, I have a question.
>>> a = ([],)
>>> a[0] += [1,2]

Traceback (most recent call last):
  File "", line 1, in 
    a[0] += [1,2]
TypeError: 'tuple' object does not support item assignment
>>> a[0].extend([1,2])
>>> a[0]
[1, 2, 1, 2]
Though += does change a[0]'s content, applying it on a tuple element raises an exception. So += and extend is not completely equal. What do you think>
[gravatar]
@laike9m: right, += and .extend are not identical. The example I showed was that += is like .extend followed by =. The += fails here because of the second step: a[0]= isn't supported by tuples.
[gravatar]
May I suggest a small correction: When you say »it's the same as "x = type(x).__iadd__(y)"« you may want to change that to »it's the same as "x = type(x).__iadd__(x, y)"« as you have to explicitly pass the »self« argument when using the raw function on the class.
[gravatar]
@Niels: thanks for pointing that out, and sorry it's taken me so long to fix it!
[gravatar]
Ned, Thanks for the great explanation. I know this is an unusual request - but is there any chance you have the presentation as a powerpoint/pdf that I can use in a training seminar ?
Thanks
Ravi
[gravatar]
@Ravi, I don't have other formats for this presentation, but you can use the HTML slides, as long as you leave my name on them.
[gravatar]
Hi!

Thanks for the helpful presentation!
I've a completely different question: which ssoftware did you use for creating your presentation?

thanks
Chris
[gravatar]
@chris: I get this question enough that I wrote a blog post about it: https://nedbatchelder.com/blog/201504/how_i_make_presentations.html
[gravatar]
As far as I understand, strings aren't truly immutable.

This for example prints me the same id twice:
s = 'c'
s = s + 'c'
print(id(s))
s = s + 'c'
print(id(s))
And `timeit("s += 'c'", "s = ''")` takes my PC about 0.15 seconds while `timeit("t = s; s += 'c'", "s = ''")` takes 121 seconds. And you can't blame it on the extra assignment work, because `timeit("t = s; t = None; s += 'c'", "s = ''")` does more of that but only takes 0.15 seconds.

As far as I understand, Python (at least CPython) actually changes string objects in place in certain cases where that's possible. In that timeit example, `t = s` prevents it because otherwise `t` would be affected by the change. Also, it apparently won't do it when it runs out of capacity, as the following code shows:
s = ''
for _ in range(100):
    old_id = id(s)
    s += 'c'
    if id(s) != old_id:
        print(len(s))
That prints the lengths where the id changed, and it prints me the lengths 1, 2, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88 and 96. At larger lengths, the steps are somewhat erratic, for example I get an id change from length 488 to 489 but *no* id change from length 784 all the way up to length 2696.

Alex Martelli talked about this optimization here:
https://stackoverflow.com/a/1350289/1672429
[gravatar]
Your example demonstrates that CPython has the freedom to reuse objects, or object addresses. It doesn't show that strings are mutable. CPython understands that the object is only referenced by one name, and so it can take a shortcut without altering the semantics of the language.
[gravatar]
Not sure what your argument is. If I looked at the source code and showed you that `s += 'c'` only wrote that character into that string's internal character array and increased its internal length value, would you still say the string didn't get changed?
[gravatar]
The operations CPython performs are isomorphic to making a new string with the new value at the same address that the old object was at. CPython is allowed to take shortcuts. Other than the timing, there is no observable difference. Strings are immutable.

Have you tried this?
$ python2.7
>>> id(object())
4339343504
>>> id(object())
4339343504
[gravatar]
The __add__ on a list type does rebinding as I understood from code below:
You mention its mutate in this case:

x = [1]; y = [2]
print("id before=",id(x))
x +=y; # mutation happens
print("id after =",id(x))

x = [1]; y = [2];
print("id before=",id(x))
x = type(x).__add__(x,y);
print("id after =",id(x))

Output:
id before= 4459500424
id after = 4459500424
id before= 4459503048
id after = 4459964040
[gravatar]
@Ramnik: __add__ doesn't rebind, it just produces a new list. You are assigning the result of __add__ back to the variable. That assignment is doing the rebinding.
[gravatar]
So it is __add__ is definitely not mutating then. From what I understand, you mention __add__ is mutating. Your explanation on this topic in the slides' explanation above needs more clarity. Your answer has further confused me about the difference between simply creating a new list , rebinding and mutating.
[gravatar]
@Ramnik, I'm sorry but I don't know what you are referring to. I can't find a mention of __add__. I say, "With lists, “nums = nums+more” will rebind nums", but that is about the assignment rebinding.
[gravatar]
Is there a way to subscribe to your blog or site?
[gravatar]
Great article, thanks!
[gravatar]
As an adult learner of Python, coming from medical background without any computing science experience, I found this extremely useful. I have written about my own mental model based on my research, as a post involving Genie, recycle bin and sleeping princesses. I suspect I might be missing something big, but it helped me to grok as a beginner python programmer. Your feedback will be much appreciated.
The story I have written is in the githib with some example code.
Thanks a lot,
Satya Saladi.
[gravatar]
Albrecht Kadauke 4:23 PM on 7 Dec 2019
To introduce the difference between "rebinding" and "mutating" is very helpful! Thank you very much, Ned!
[gravatar]
a=2000
b1=1000+1000
c=1000
b2=c+c
print(id(a))
print(id(b1))
print(id(b2))



why do b1 and b2 get bound to different objects here .
[gravatar]
"b1" is bound to the same object as "a" but in the case of b2 a new object is being created
[gravatar]
Python doesn't require that equal ints always be represented by the same object. In this case, "1000+1000" is constant-folded by the compiler, and compiled to the same object as "2000" at compile time. Then at run time, another 2000 is created by adding.
[gravatar]
Hello Ned,

Thank you for this talk! And an aside, thank you for helping the broader Python community. I see you a lot in Python Discord and it's so cool to see an expert in the community so down-to-earth and helping out. :)

What do you think of summarizing it this way: In python, names always hold references. Whether,

- data being mutable or immutable
- an operation makes a copy of the data (thus, a new reference) or doesn't

would tell us whether an "assignment" (`=`) operation will affect multiple names or not.

Is this understanding accurate?

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.