Transparently use avro schemata (.avsc) files in a python module

One of the cool things about avro is that it has bindings in a couple of different languages. However, I think the only one that has native code generation support for working with avro objects is Java, which makes working with avro in the other languages a bit harder. Here’s a simple way to load your schemata dynamically (and if you don’t want to write your schemata by hand, then using maven you can generate it from AVDL files using the tip from my previous post).

What this bit of code does is overrides the __getattr__ function on the module, so anytime you try to access a type on the module, it will attempt to load the avro schema from a file of the same name with the avsc extension. To use this code, create a file called __init__.py in your directory of .avsc files, and paste the following code in.

import sys
from os.path import join, dirname
import avro.schema

class AvroSchemaLoader(object):
    '''
        This object allows us to lazily load schemata files in the current
        directory and parse them as needed.
        
        It is intended to be used as a replacement of the current module in
        sys.modules, so usage of this object should be transparent to users.
        
        For example, to access the Foo wrapper object, you would do the
        following:
        
            >>> from this_dir_name import Foo
            >>> print type(Foo)
            <avro.schema.RecordSchema at ...>
            >>>
    '''
    
    def __init__(self, module):
        # things break in odd ways if you don't keep a reference to the module here
        self.__module = module  

    def __getattr__(self, name):
        if name.startswith('__'):
            return object.__getattr__(self, name)
        
        with open(join(dirname(__file__), '%s.avsc' % name), "r") as fp:
            schema = avro.schema.parse(fp.read())
        
        setattr(self, name, schema)
        return schema


# Replace this module instance with the dynamic loader
sys.modules[__name__] = AvroSchemaLoader(sys.modules[__name__])

There’s a lot you can do to make this better — like load a wrapper around the schema instead of using the schema directly. I’ll leave that as an exercise for the reader. 🙂

Leave a Reply