Wednesday 26 April 2006 — This is close to 19 years old. Be careful.
Here are a few things I learned yesterday while making a Python program go faster:
1: When sorting a list, you can provide a comparison function, often in the form of a lambda:
mylist.sort(lambda x,y: -cmp(x.computeValue(), y.computeValue()))
This is cool because you can control how values are sorted. But it’s bad because the function is invoked for every comparison of pairs in the list. If computeValue is truly intensive (for example, if it queries your database), there’s a lot of work going on. Also, why’d I have to repeat “computeValue()” in the lambda?
Turns out I didn’t. Since Python 2.4, sort also has a key= argument, which is a function of one element which returns the key to use for sorting the element:
mylist.sort(key=lambda x: x.computeValue(), reverse=True)
The key function is called once per element, and the values returned stored to perform the comparisons. The reverse=True argument is also new in 2.4, to force the sort in the other direction, instead of the negative cmp trick shown above, or worse, the x,y then y,x trick I’ve sometimes seen in comparison functions.
I made this one-line change and saved myself 1000 database queries!
2: If you are measuring the time taken by a chunk of Python code, it matters what platform you are running on. On Windows, time.clock() is the wall time, but on Unix, it is the processor time. As the timeit module shows, time.time() is the best option for wall time on Unix. It makes a huge difference to measure wall time rather than processor time.
We spent a while yesterday trying to find a missing half-second. It turns out it was all the time that our process wasn’t executing (for example, waiting for the database)!
3: If you have a number in fraction of seconds, and you want to display it in milliseconds, you multiply by 1000, but don’t do it like this:
print "Elapsed time: %d" % secs*1000
Because that will print:
Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0Elapsed time: 0
and so on, 1000 times. The % operator has higher precedence than *, so what you really want is:
print "Elapsed time: %d" % (secs*1000)
Comments
In [1]: x = .123567
In [2]: print "%d" % (x*1000)
123
When what you really want is 124. So:
In [3]: round(x, 3)
Out[3]: 0.124
In [4]: print "%d" % (round(x,3) * 1000)
124
works better for all involved.
/shudder
For the love of Raymond, please Google for "decorate-sort-undecorate" and save 999 more.
>>> x = .123567
>>> print "%.0f" % (x * 1000)
124
Add a comment: