This was a practical tutorial by Titus Brown and Grig Georghiu, two accomplished presenters who have given popular presentations at the last two PyCons. Titus opened the presentation, and told us he wanted us to be "test infected" - to make testing a necessary part of our development process. He confessed he feels a sense of physical discomfort when he writes untested code.
He gave us a number of practical guidelines, the most interesting of which were: start small; focus on actual problems; use continuous integration. The goal of testing should be to fail as quickly and as hard as possible: failure is good, because it reveals a need for revision. You should plan for testability, and as your experience level increases you will find that you naturally tend to do this.
Use a hierarchy of tests. The initial tests should run in under 20 seconds; only if those tests pass should you run the full test, which will take longer; then run regression tests, and then (if you have defined them) the acceptance tests. The point is that much of this work isn't done on your time: your continuous integration may be an overnight run, but you don't have to stay while they run. Titus feels that any test that takes over a couple of hours has limited (though definitely not zero) value, due to the lack of immediacy in the feedback it provides.
Taxonomy of testing: unit tests, functional tests, regression tests, UI tests, acceptance tests, integration tests, continuous integration, performance tests. Do you even know what all these are? Which to you need to define and use?
The more complicated it is to run the right tests, the higher the probability they won't get run. Titus talks about different strategies: TDD (Test-Driven Development) and SDT (Stupidity Driven Testing) , which he equated with TED (Test Enhanced Development). The general myth is that you are a bad person if you don't use TDD, but SDT is very important because you can ensure that bugs get properly squashed by adding specific tests after discovering one of the gruesome faults we all put in ur software from time to time.
Tests can be used to constrain your code; they force you to document your expectations, both internally and externally. internally, use more assert statements. Externally, write unit, functional and regression tests as "black box" tests to verify external behavior.
A "test umbrella" removes the need for memory. Everything should report back to a single source. This gives you a way to tell a new developer "run this; if it doesn't work, you have something to fix". You just have to remember to run the tests, not go through some complex sequence of commands that need separate documentation. This automation investment will reap big dividends in both time and quality.
Testing really helps when you want to refactor.
While code coverage may not be a good metric, it's definitely the case that if you don't run every line of code at least once in your tests then the possibility of failure becomes much higher. Line coverage is necessary, but not sufficient. Continuous integration helps you discover less obvious errors, like changes in process you forgot to document.
WSGI intercept is a useful test mechanism because it monkey-patches the Python web libraries.
[Interesting demos; I went for water for the speakers just before they started; my tests failed, and I never discovered whether they were supposed to or whether I had missed some crucial step while I was out].
The scotch tool is a recording proxy that will record everything that passes between the browser and the server: exceptionally useful for capturing AJAX interactions. It can also replay recorded sessions, and ensure that the responses that come back are "the same" (with minor exceptions such as ignoring date changes).
Figleaf is a code coverage recording tool. It saves it results in a pickle-based format, and can perform intelligent comparisons between tests, taking the union and intersection so you can easily discern what still needs testing.
Nose is a system that allows testing with a much lighter footprint than unittest. It will also all other doctests and unittests it can find if properly encouraged to do so, and can be used to add module-level setup and teardown.
Grig then took over and first described and then demonstrated Selenium, a testing tool written in Javascript that runs inside the browser (and probably one of the best-known web testing tools). It's one of the few tools that can adequately test AJAX functionality, and it's the default testing tool at Google). It has several components.
The core is augmented by an RC (remote control) module, and IDE (a Firefox add-on), and a Grid module that seems to relate to parallelized testing.
The biggest issue in defining Selenium tests is locating the required elements in the page. It would be much easier if web authors gave all elements IDs, but in practice you often end up using xpath, which has downsides. Many times there is no other way.
The IDE component allows you to record interactions with web servers, but again you have to be careful to use repeatable event sequences (e.g. pasting is not repeatable); the IDE is probably not as easy to use as Twill, but some AJAX applications are almost pure client-side, with very few server round-trips, and Twill is useless with these as it can't intercept the traffic.
Firebug is an essential component in developing the tests, as is the XPath Checker add-on, which helps greatly in generating the element locators.
The basis of Seleniun tests is a long string of locators and assertions. Selenium RC runs as a reverse HTTP proxy, bypassing the "same origin" cross-site scripting protections that normally insist that scripted content has to come from a single domain. By fooling the browser this way remote control allows you to write test of a remote server inside the local proxy. It also allows you to build simple looping constructs. Of interest to readers will be its ability to export test cases as (rather ugly) Python code. Google has a whole team working on Selenium, and others are also working to improve it, and extend it to broader browser support.
Grig then went on to describe buildbot, a continuous integration testing tool that allows a master system to run build (or indeed any other) commands on remote slaves and aggregate reports from the clients in a simple, readily available web format.
Although some people perform continuous integration on a single platform, buildbot was designed from the start to allow the master to drive the same tests across multiple clients and thereby discover cross-platform bugs that might otherwise go undetected.
Greg ran us through a configuration file. The learning curve can be steep, but once your buildbot is configured it usually needs little maintenance. He also highlighted the fact that someone has to keep on top of buildbots and avoid bitrot by displaying the currrently parlous state of the Python buildbot farm. Many clients are down, and those that are running appear to be failing their tests.
The evening closed with a brief discussion of Fitnesse, which is a higher-level tool that has you define tests declaratively in a language that allows you to refer to the concepts of the application domain; this makes them easier for the customer to comprehend, for example. The tests are maintained in a wiki.
I was beginning to flag a bit by now, so I can't really give you an intelligent summary better than Grig's "it's a Wiki that runs things". Ideally the client will at least understand the test descriptions.
All in all an excellent and enjoyable presentation that not only gave me some great ideas for testing but also left me more determined to introduce it into my web projects.
2 comments:
I remember you going out for water, but I can't put it exactly in the context of where they were in the tutorial. Regardless, as I recall, every test that failed during the demo failed because it was supposed to fail at that time.
Ken
Great stuff here on test hierarchies and test taxonomies!
Post a Comment