You are here

April 2013

Tickets and testing

Some time ago, I received a ticket from a customer to change an A record in a zone, a quick, easy thing to do. I made the change and told them that it was done. A few days later, I got a followup ticket saying “This doesn’t appear to be done.” I checked and, sure enough, the change didn’t appear when I queried the DNS servers. Apparently, in my haste, I had forgotten to do one simple but necessary step: increment the serial number in the zone file. Oops. Chagrined, I incremented the serial number, reloaded the zone file, verified that the change did indeed appear, and apologized profusely to the customer.

If only I tested the change before I decided I was done with the work, I could have found my error before my customer.

Mistakes like this don’t happen often, but they do happen. I know I have made many errors like this one throughout my time as a system administrator.

One Friday about a year ago, I handled a particularly hairy ticket. I spent the entire weekend half-expecting, dreading, I would get a phone call from the person on call telling me that I had screwed up. That call never came so I guess I did it right. On the following Monday, I decided to avoid weekends like that one and adopted a new policy: Test work I do before saying that I’m done. Before I do something, I would set up a test for the desired outcome, verify that the test fails, do the work, and then verify that the test passes.

An Example

To give you an idea of how this works, let’s do an example based on the scenario I mentioned earlier. Since I use Cucumber for my tests, I write a feature that looks like:

Feature: Change A record for foo.example.com
  Scenario:
    When I query for the A record for “foo.example.com”
    Then I should get a response with the A record “10.1.1.2”

I then run this feature and verify that it fails:
Cucumber says it failed!

I make the change on the DNS server and run the test again:
Cucumber says it failed again!

What? It failed? Oh, I didn’t increment the serial number again. Let’s fix that and:
Cucumber says it passed!

It passed! That means I’m done and can tell that to my customer, confident that an error hasn’t slipped through.

What I’ve learned

Having done this for about a year, I’ve learned several things.

  1. There is a cost to testing. If I already have step definitions set up for my test, it only adds a few minutes. If I have to write new step definitions, it takes a lot longer. (Some step definitions, particularly those that have to scrape websites, have taken hours to write.) Even if it only takes a few minutes to set up and run the test, that’s still a few minutes.
  2. Some things cannot or should not be tested directly. In these cases, indirect tests are needed. Instead of verifying the desired behavior, you might need to test the configuration that specifies the desired behavior. Care must be taken when using indirect tests since they are not as reliable as direct tests.

    For example, most tasks involving email delivery or routing should not be tested directly because emails sent for testing would be seen as unwanted noise by the customer or because you do not have the login credentials for the email account. If your customer wants to emails to forwarded to , you can test the mailserver configuration to verify that the forwarding is set up.

  3. Make sure that the test checks everything that is being done. (This is especially true for indirect tests.). If your test only checks part of the functionality, your test may pass but you’re not actually done. For example, if you’re supposed to set up a website that pulls data from a database, make sure you test for content that’s in the database. If you only test for the presence of a string from static content (say, from the template for the site), your test will pass even though you haven’t set up the database privileges correctly.

I believe in testing. Why should you?

Every error has costs. The obvious cost is time: It takes time to fix an error. This time may exceed the original time it took to do the work.

The more important cost is in customer respect and goodwill. Every error that your customers see takes away from the respect and goodwill you’ve built up. Sadly, it’s a lot easier to lose respect than to gain it. If your customers see enough errors from you that you deplete that respect, there will be consequences. Even if they don’t quit our services entirely, they are sure to tell their friends and colleagues that you’re incompetent, untrustworthy, and, perhaps most damning, unprofessional.

“But I don’t have time to test!” you might say. That’s precisely when you should be testing! If you’re doing everything at a breakneck pace, you’re sure to make mistakes. When you’re stressed and overloaded is precisely when you should slow down and do things carefully. Every error you catch now saves you from paying the cost of that error later.

How has this worked out over the past year?

Over the past year, I have tested most of the work I have done. Various reasons have prevented me from testing all of it.

I feel like my customer-visible error rate has decreased significantly. (I don’t have accurate statistics so I don’t know for sure.) Of all of the mistakes that were reported by my customers in the past year, only one has been for work I had tested and that was because the test was not complete. All other reported errors were for untested work. Said another way: My customer-visible error rate for properly tested work is 0%. I think that’s a pretty good track record.

Closing thoughts

Creating tests for checking your work will help you decrease your customer-visible error rate. They won't help you make less errors but they will help you prevent your customers from seeing them.

I recommend trying it out. Write a test for a particular work item, do it, make your test pass. Now do the same for your next work item. And then the one after that. And so on.

More on acceptance testing

A couple days ago, I wrote about writing tests for tickets. (Didn’t read that post? Go on and read it. All done? Good.) I figured I should say more about the theory behind the practice

Theory

The tests I write for tickets are based on the concept of “acceptance tests” from agile development. Acceptance tests are usually written by customers and developers together to determine when a given feature is done. A feature cannot be considered finished (and should not be released to the customer) until all of the acceptance tests for the feature pass.

Acceptance tests are written using the language of the business domain and are understandable by both the customers and the developers. They often follow the pattern of “Given/When/Then”, i.e. “Given something is true, When I do something, Then I should see something.” Implementation-specific language is not used in these tests.

My testing process is based on the practice of “acceptance test-driven development” (ATDD). Under ATDD, acceptance tests are written before any code is written. Once the test is verified to fail, the developer writes the code. (By verifying that the test failed first, you ensure that you know that the code you wrote caused the test to pass.) When the developer believes they are done or wants to check their work so far, they run the tests. Once, and only once, the tests pass, they an consider their work done.

Sysadmin reality

As a system administrator, I do not have the luxury to work out these tests directly with my customer. I have to rely on what they have written in their ticket (or said over the phone) to figure out what they want. Sometimes, I have to ask for clarification to make sure I understand their request well enough to write the tests.

Since I have to work out on my own what my customer wants, the tests I write are not guaranteed to be as effective for establishing when I am truly done. After all, my understanding of what they want may be incomplete and, therefore while my tests say I am done with the request, I’m not really done: I have not done everything my customer wanted.

One benefit of having to devise these tests myself is that I can use implementation-specific language. This lets me simplify the implementations of tests although maybe not the tests themselves. For example, I can write implementation-specific steps for getting the list of email accounts from mailservers (e.g. “When I get the list of accounts in Postfix”, “When I get the list of accounts in exchange”, etc.) without having to write conditional and abstraction logic within the test implementation.

To learn more

ATDD is covered in detail in ATDD By Example by Markus Gärtner. In addition to discussing the idea behind ATDD, he provides some coverage of two tools, Cucumber and FitNesse. The Cucumber Book by Matt Wynne and Aslak Hellesøy covers Cucumber in more detail.

The second part of The RSpec Book by David Chelimsky et al covers behavior-driven development (BDD) which is similar to and incorporates aspects of ATDD. (One of the foundations of BDD is “acceptance test-driven planning.”) I do not believe my testing methodology follows BDD since I am not always testing behavior. (See my comments about indirect tests in my last post.)

Finally, in chapter seven of The Clean Coder, Robert Martin discusses acceptance testing and the importance of establishing a correct definition of done.