Thursday 24 January 2013 — This is 12 years old. Be careful.
I started a new side project this month, called Byterun. It’s a pure-Python implementation of Python bytecode execution. That is, it runs Python bytecode.
I did it because Coverage.py has some bugs in its branch coverage. It analyzes your program for potential branches by reading the bytecode rather than the source, and I think some of the bugs are due to subtle misunderstandings on my part about how Python bytecode works.
So I figured if I could have a working implementation of bytecode in Python, it would be a good way to be sure I understood how they work. I found a ten-year-old implementation called pyvm2.py by Paul Swartz, and started refurbishing it, fixing bugs, adding missing opcodes, and bringing it up to 2.7 and 3.3.
It isn’t finished yet, and supporting Python 2 and Python 3 might be a bit difficult, but already it has helped me understand a few things better, such as how generators are suspended and resumed: when the generator function yields, instead of discarding the stack frame as a normal function’s return does, the frame object is held by the generator object, so the next time a value is needed, the frame can be resumed just where it was.
Closure cells make more sense now too, and I know how to create them by hand. I was creating Python function objects with types.FunctionType. But one of its arguments is “closure”, which must be a tuple of cell objects. The types module has ways of making functions, generators, classes, and so on, but has no way to make a cell.
Turns out you can do it by creating a closure and grabbing its cells:
def make_cell(value):
# Construct an actual cell object by creating a closure right here,
# and grabbing the cell object out of the function we create.
return (lambda x: lambda: x)(value).func_closure[0]
A fine bit of Python haiku there...
Comments
In Cpython you can also use ctypes to call PyCell_New:
@Nick (et al): the nut of the idea came from Alex Gaynor (of course...)
I'm glad someone takes bytecode seriously.
So much more can be elucidated when you drop from symbolic to bytecode level...
p.s. pypy evaluator is perhaps too complex for the task?
Add a comment: