August 6, 2010

Tests That Test Your Tests

I just had an interesting experience with Steve Miller, the technical editor of the Python classes I am writing for O'Reilly School of Technology. We are just getting to the end of the second of four courses, and I like to think that we are moving right along: in the final lesson students write a simple GUI-based program that searches for and displays e-mail messages stored in a MySQL database.

I introduced test-driven development in the second course. Not only does this encourage good habits in the students, it also makes it somewhat easier to test some of their exercises (though I still do not have a good approach to testing Tkinter-based GUI applications). In time-honored fashion we start with tests and a program full of stubs, and then expand the stubs to pass the tests. For the email database the initial API is very simple: there is one function to store messages and two others to retrieve them, by primary key and Message-Id.

In the final chapter I start out with a very simple database table that stores the body of the message as a LONGTEXT column. The only other columns in the table (to start with—it gets more complex later) are the automatically-generated primary key and the Message-Id Header. The tests use a setUp() method that completely re-creates the database table and populates it from messages stored in a bunch of files:

 FILESPEC = "V:/Python2/Lesson12/MailData/*.eml"  
 class testRealEmail_traffic(unittest.TestCase):  
   def setUp(self):  
     Reads an arbitrary number of mail messages and  
     stores them in a brand new messages table.  
     DANGER: Any existing message table WILL be lost.  
     curs.execute("DROP TABLE IF EXISTS message")  
     files = glob(FILESPEC)  
     self.msgids = {} # Keyed by message_id  
     self.message_ids = {} # keyed by id  
     for f in files:  
       ff = open(f)  
       text =  
       msg = message_from_string(text)  
       id = self.msgids[msg['message-id']] =  
       self.message_ids[id] = msg['message-id']  

There were two relatively simple tests, one to test each of the retrieval functions in a fairly simplistic way by verifying the correspondence between the primary key values and the Message-Id headers using the msgids and message_ids dicts created during the set-up. The initial stub under test only implemented the store() function, so these tests were initially expected to fail:

   def test_message_ids(self):  
     Verify that items retrieved by id have the correct Message-ID.  
     for message_id in self.msgids.keys():  
       pk, msg = maildb.msg_by_id(self.msgids[message_id])   
       self.assertEqual(msg['message-id'], message_id)  
   def test_ids(self):  
     Verify that items retrieved by message_id have the correct Message-ID.  
     for id in self.message_ids.keys():  
       pk, msg = maildb.msg_by_message_id(self.message_ids[id])  
       self.assertEqual(msg['message-id'], self.message_ids[id])  

Steve and I had both run this code under much the same conditions as the students would, and verified that the tests did indeed fail due to the AttributeError exceptions raised by the missing msg_by_id() and msg_by_message_id() functions. Later steps have the student implement these functions, which makes the tests pass.

We were somewhat surprised to find when Steve ran his final checks that the tests were now passing, even though he had reverted to the original module with no implementations of the message retrieval functions! It took me the best part of an hour chatting with Steve to eliminate everything I could think of that might be wrong: no .pyc files left lying around, no odd path settings that allowed import from other copies of the code, and so on.

We finally tracked the issue down to something stupidly simple, as is often the case with bugs that have you scratching your head for an extended period: for production purposes the data files had been moved to an area where all students could share them (each student has their own V: directory). This meant that the setUp() method was not inserting any rows into the newly-created message table. The empty table in turn meant that neither test_message_ids() nor test_ids() was running the the body of the for loop, and consequently no AttributeError exceptions were being raised. The tests were passing even though the functions they were supposed to test had not been implemented!

My solution to this was to add a further test to verify that the table was not empty. That way, even if the other tests passed, this one would fail:

   def test_not_empty(self):  
     Verify that the setUp method actually created some messages.  
     If it finds no files there will be no messages in the table,  
     the loop bodies in the other tests will never run, and potential  
     errors will never be discovered.  
     curs.execute("SELECT COUNT(*) FROM message")  
     messagect = curs.fetchone()[0]  
     self.assertGreater(messagect, 0, "Database message table is empty")  

The check could have been added to one of the other tests but it seemed to make more sense to keep it separate, since several tests relied on the table having been populated.

In this case the error condition was that the test were passing! I am happy about this because it clearly demonstrates the value of test-driven development, even though the result I was getting was normally the desired goal of testing. It has also taught me to be more careful about tests in loops: if there is no guarantee that the loop body will execute then the tests inside it can be completely useless.

1 comment:

PeterHansen said...

Good point about tests in loops, or any other code path that may not be entered.

A common technique to avoid this issue (though the sanity-test approach is good too) is to assert something about the data before using it. Here I'd do a simple self.assert_(self.msgids) before the loop, or perhaps something more specific with len(self.msgids)