Organic metaclasses

Sunday 18 April 2010This is almost 15 years old. Be careful.

The way I learn things, I can read about something a number of times, and intellectually understand it, but it won’t really sink in until I have a real reason to try it out myself. Toy examples don’t do it for me, I have to have an actual problem in hand before the solution becomes part of my repertoire. Recently I finally had a use for metaclasses.

I wanted to create an in-memory list of items that I could reference by key. It was a micro-database of languages:

class Language(object):

    # The class attribute of all languages, mapped by id.    
    _db = {}
    
    def __init__(self, **kwargs):
        for k, v in kwargs.iteritems():
            setattr(self, k, v)
        self._db[self.id] = self
        
    @classmethod
    def get(cls, key):
        return cls._db.get(key)

Language(
    id = 'en',
    name = _('English'),
    native = u'English',
    )

Language(
    id = 'fr',
    name = _('French'),
    native = u'Fran\u00E7ais',
    )

Language(
    id = 'nl',
    name = _('Dutch'),
    native = u'Nederlands',
    )

# Some time later:
lang = Language.get(langcode)
lang.native # blah blah

This worked well, it gave me a simple schema-less set of constant items that I could look up by id. And the class attribute _db is used implicitly in the constructor, so I get a clean declarative syntax for building my list of languages.

But then I wanted another another set, for countries, so I made a MiniDbItem class to derive both Language and Country from:

class MiniDbItem(object):
    def __init__(self, **kwargs):
        for k, v in kwargs.iteritems():
            setattr(self, k, v)
        self._db[self.id] = self
        
    @classmethod
    def get(cls, key):
        return cls._db.get(key)

class Language(MiniDbItem):
    _db = {}

Language(id='en', ...)
Lanugage(id='fr', ...)

class Country(MiniDbItem):
    _db = {}
    
Country(id='US', ...)
Country(id='FR', ...)

This works, but the unfortunate part is that each derived class has to define it’s own _db class attribute to keep the Languages separate from the Countries. Each derived class is obligated to do that little bit of redundant work, or the MiniDbItem base class isn’t used properly.

The way to avoid that is to use a metaclass. The metaclass provides an __init__ method. In a class, __init__ is called when new class instances are created, but in a metaclass, __init__ is called when new classes are created.

class MetaMiniDbItem(type):
    """ A metaclass to give every class derived from MiniDbItem
        a _db attribute.
    """
    def __init__(cls, name, bases, dict):
        super(MetaMiniDbItem, cls).__init__(name, bases, dict)
        # Each class has its own _db, a dict of its items
        cls._db = {}

class MiniDbItem(object):
    
    __metaclass__ = MetaMiniDbItem

    def __init__(self, **kwargs):
        for k, v in kwargs.iteritems():
            setattr(self, k, v)
        self._db[self.id] = self
        
    @classmethod
    def get(cls, key):
        return cls._db.get(key)

class Language(MiniDbItem): pass

Language(id='en', ...)
Lanugage(id='fr', ...)

class Country(MiniDbItem): pass
    
Country(id='US', ...)
Country(id='FR', ...)

Now MetaMiniDbItem.__init__ is invoked twice: once when class Language is defined, and again when class Country is defined. The class being constructed is passed in as the cls parameter. We use super to invoke the regular class creation machinery, then simply set the _db attribute on the class like we want.

Of course, metaclasses can be used to do many more things than simply setting a class attribute, but this example was the first time in my work that metaclasses seemed like a natural solution to a problem rather than an advanced-magic Stupid Python Trick.

Comments

[gravatar]
A metaclass is a bit heavy-handed for a class variable.

Try:
class MiniDbItem(object):
    
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
        self._db()[self.id] = self
      
    @classmethod
    def _db(cls):
        try:
            db = cls.__db
        except AttributeError:
            db = cls.__db = {}
        return db

    @classmethod
    def get(cls, key):
        return cls._db().get(key)
[gravatar]
    def __init__(cls, name, bases, dct):
        super(MetaMiniDbItem, cls).__init__(name, bases, dct)
        # Each class has its own _db, a dict of its items
        cls._db = {}

it would probably be cleaner to alter dct (usually not dict, or it overwrites the builtin) before calling super, and I believe that's usually done in __new__:

    def __new__(typ, name, bases, dct):
        return super(MetaMiniDbItem, typ).__new__(name, bases, dict(dct, _db={}))
[gravatar]
@anon: your way also works. Although a metaclass does seem a bit heavy here, I like the way it exactly takes over the creation of class attributes that I had been doing by hand. Your way has the same effect but with a slightly different dynamic.

@masklinn: thanks for the tip about dct vs. dict, but I don't agree that it's cleaner to bundle everything into the __new__ call. That seems overloaded to me.

In any case, thanks for the suggestions, as a metaclass newb, it's good to see other possibilities.
[gravatar]
An interesting problem is: what is the Right Thing to do if there is a subclass of a subclass of MiniDbItem? What the code as given will do is give it an entirely separate database, but does that make sense for the application? On the other hand, does creating a database for each direct-subclass only make sense?
[gravatar]
@Kevin, true I hadn't considered sub-classes, and I don't expect to need them for this application. I suppose you could make a case both ways: that subclasses should share a db, and that subclasses should get their own.
[gravatar]
Another solution is using class factory, à la namedtuple:
class MiniDbItem(object):
    def __init__(self, **kargs):
        self.__dict__.update(kargs)
        self._db[self.id] = self

    @classmethod
    def get(cls, key):
        return cls._db.get(key)

def MiniDb(name):
    return type(name, (MiniDbItem,), {'_db':dict()})

Language = MiniDb('Language')
Country = MiniDb('Country')

Language(id='en', name="English")
Country(id='fr', name='France')

print Language.get('en').name
print Country.get('fr').name
[gravatar]
It seems to me that making the DB implicitly in the class is causing the need for unnecessary complexity. My thought is to have a DB that generates entries when called, rather than an object that adds itself to a DB:
class MiniDB(dict):
    def __init__(self, cls=object, **kwargs):
        dict.__init__(self, **kwargs)
        self.default_entry = cls

    def __call__(self, id, **kwargs):
        self[id] = entry = self.default_entry()
        entry.id = id
        for k, v in kwargs.iteritems():
            setattr(entry, k, v)

Language = MiniDB(type('Language', (object,), dict()))
Language(id = 'en', name = 'English', native = u'English')

Country = MiniDB(type('Country', (object,), dict()))
print Country.get('US').language.native

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.