Python tuning tips

Wednesday 26 April 2006This is close to 19 years old. Be careful.

Here are a few things I learned yesterday while making a Python program go faster:

1: When sorting a list, you can provide a comparison function, often in the form of a lambda:

mylist.sort(lambda x,y: -cmp(x.computeValue(), y.computeValue()))

This is cool because you can control how values are sorted. But it’s bad because the function is invoked for every comparison of pairs in the list. If computeValue is truly intensive (for example, if it queries your database), there’s a lot of work going on. Also, why’d I have to repeat “computeValue()” in the lambda?

Turns out I didn’t. Since Python 2.4, sort also has a key= argument, which is a function of one element which returns the key to use for sorting the element:

mylist.sort(key=lambda x: x.computeValue(), reverse=True)

The key function is called once per element, and the values returned stored to perform the comparisons. The reverse=True argument is also new in 2.4, to force the sort in the other direction, instead of the negative cmp trick shown above, or worse, the x,y then y,x trick I’ve sometimes seen in comparison functions.

I made this one-line change and saved myself 1000 database queries!

2: If you are measuring the time taken by a chunk of Python code, it matters what platform you are running on. On Windows, time.clock() is the wall time, but on Unix, it is the processor time. As the timeit module shows, time.time() is the best option for wall time on Unix. It makes a huge difference to measure wall time rather than processor time.

We spent a while yesterday trying to find a missing half-second. It turns out it was all the time that our process wasn’t executing (for example, waiting for the database)!

3: If you have a number in fraction of seconds, and you want to display it in milliseconds, you multiply by 1000, but don’t do it like this:

print "Elapsed time: %d" % secs*1000

Because that will print:

Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0

and so on, 1000 times. The % operator has higher precedence than *, so what you really want is:

print "Elapsed time: %d" % (secs*1000)

Comments

[gravatar]
You can also improve on what you have at the bottom. To demonstrate:

In [1]: x = .123567

In [2]: print "%d" % (x*1000)
123

When what you really want is 124. So:

In [3]: round(x, 3)
Out[3]: 0.124

In [4]: print "%d" % (round(x,3) * 1000)
124

works better for all involved.
[gravatar]
Thank you for the sort key pointer! I'm sorting search results by their associated tag count and each obj.tags.filter(...).count() is a db hit. Thanks again!
[gravatar]
> I made this one-line change and saved myself 1000 database queries!

/shudder

For the love of Raymond, please Google for "decorate-sort-undecorate" and save 999 more.
[gravatar]
key=getattr(computeValue)
[gravatar]
key=getattr(computeValue)
[gravatar]
Footnote: you can get the string formatter to do the rounding for you:

>>> x = .123567
>>> print "%.0f" % (x * 1000)
124
[gravatar]
sure can - but I prefer the explicit way rather than the string formatting way. I always have to look up the string formatting options, but I can remember round() very easily.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.