Test data provider using Python metaclass

Posted on Sat 01 October 2011 in Coding

In my airpnp project, I had to write a binary plist parser in order to parse certain data posted by iDevices. To support my unit tests, I created a number of binary plist files using an existing tool. I ended up with 17 files, and quickly realized that I wanted to use a data provider and a single test method. I’m very familiar with TestNG, where a data provider approach (in Java) would look something like this:

public class TestClass {
    @DataProvider()
    public Iterator providerMethod() {
        // return an iterator over test data rows
    }

    @Test(dataProvider = "providerMethod")
    public void testMethod(...data parameters...) {
        // test and assert
    }
}

When TestNG runs, it calls the testMethod method with data generated from the providerMethod method’s iterator as many times as necessary. Python’s unittest module doesn’t have support for data providers, so I had to roll my own. Since the unittest module discovers test methods by enumerating the attributes of a type, looking for methods whose name starts with “test”, I had to create such methods dynamically. Having just learned about Python metaclasses, I thought it would be fun and educational to create a metaclass-based data provider approach. (Note that I don’t claim that using a metaclass is the best solution for implementing data providers. I don’t even claim that it’s a good solution!)

I ended up with a metaclass that iterates over my test data and generates a test method for each row of data. This is what the code looked like:

# imports omitted...
DATA = [
    ['plist/true.bin', True],
    ['plist/false.bin', False],
    # ...more data rows omitted...
]

class DataProvider(type):
    def __new__(meta, classname, bases, classDict):
        def create_test_method(fname, expected):
            def test_method(self):
                fd = self.read_file(fname)
                obj = read_binary_plist(fd)
                self.assertEqual(obj, expected)
            return test_method

        # generate new methods based on test data
        for fname, expected in DATA:
            part = os.path.splitext(os.path.basename(fname))[0]
            classDict["test_" + part] = create_test_method(fname, expected)

        # create!
        return type.__new__(meta, classname, bases, classDict)

class TestReadBinary(unittest.TestCase):
    __metaclass__ = DataProvider

    def read_file(self, fname):
        fd = open(os.path.join(os.path.dirname(__file__), fname), 'rb')

        try:
            s = fd.read()
            return StringIO(s)
        finally:
            fd.close()

The approach above has several problems:

The approach is not at all generic; it only applies to the particular situation based on which it was conceived. In particular, the metaclass has a dependency on the test class as well as on the structure of the test data.
The test data are defined outside of the test class. While not a big problem, it’s nicer to keep the test data closer to where they are used.

For this blog post, I decided to create a more generic solution, and the result is shown below. The solution handles both problems, and resembles the TestNG/Java approach with some notable differences:

The data provider is not technically an instance method, as it has no self parameter. This is because we need to call it while defining the class (type).
The data provider method is discovered by name prefix, and binds to a test method using a naming convention. This may be ok though, since test methods are discovered by name prefix anyway.

import unittest

class DataProviderSupport(type):
    def __new__(meta, classname, bases, classDict):
        # method for creating our test methods
        def create_test_method(testFunc, args):
            return lambda self: testFunc(self, *args)

        # look for data provider functions
        for attrName, attr in classDict.items():
            if attrName.startswith("dataprovider_"):
                # find out the corresponding test method
                testName = attrName[13:]
                testFunc = classDict[testName]

                # the test method is no longer needed
                del classDict[testName]

                # generate test method variants based on
                # data from the data porovider function
                i = 1
                for args in attr():
                    classDict[testName + str(i)] = create_test_method(testFunc, args)
                    i += 1

        # create the type
        return type.__new__(meta, classname, bases, classDict)

All error-checking has been excluded for brevity. In practice, there are a couple of things you would want to validate. Here’s an example of how to use it (the data_provider_test_len_function method is a generator, but it might as well have returned a list of lists or tuples):

class TestStringLength(unittest.TestCase):
    __metaclass__ = DataProviderSupport

    def dataprovider_test_len_function(): # no self!
        yield ("abc", 3)
        yield ("", 0)
        yield ("a", 1)

    def test_len_function(self, astring, expectedLength):
        self.assertEqual(expectedLength, len(astring))

It would have been nicer to use decorators to bind the data provider with the test method. However, if you pass arguments to a decorator, it is not called until the method it decorates is called, and we need its effect earlier. Name binding was the simplest solution I could come up with at this point!

Update, 2011-10-03: The code presented in this post does not work properly with nose, since nose discovers test cases differently from the unittest module. Also, nose supports test generators out of the box, which makes it kind of pointless to make the code compatible with nose.