Business Risks, Product Lifecycle, and Testing
"We've automated our tests" - The first thing that pops in my head when I hear this is "I wonder what the tests actually cover".
At heart, testing is all about coverage - code coverage, requirements coverage, business case coverage, etc. Automating these tests sets things up so that when you kick off your tests, they will go through all of these for you, so that you don't have to remember to do everything. Automation doesn't actually tell you anything about the quality of the tests.
Yes, I know, that is me belaboring the obvious, but I do so for a reason - most tests have precious little to do with business risks. And that matters because, in the end, everything you are working on is aimed at driving business value. (•)
Yes, I know, that is me belaboring the obvious, but I do so for a reason - most tests have precious little to do with business risks. And that matters because, in the end, everything you are working on is aimed at driving business value. (•)
Hold this thought, we'll get back to it in a moment - let's continue with testing. The thing to remember here is that you are never done testing. There is always some set of tests that you could be doing. Stress tests, load tests, soak tests, security tests, oh, the list goes on forever. And the tests themselves might go on forever, with some of the tests consisting of long-running stability tests, honeypots, and whatnot.
Once you accept that you are never done testing, you can now start looking at your tests from a Value vs Risk perspective
• Risk = negative_impact * likelihood
• Value = poitive_impact * likelihood
• Risk = negative_impact * likelihood
• Value = poitive_impact * likelihood
Positive impact is a lot easier to measure and falsify ("with this new feature we expect 37 new customers per day" etc.), any halfway decent product lifecycle will have this baked into it.
Negative impacts can seem harder to measure, but that is because we tend to think of them as "bugs that escaped", and not "metrics around features". Consider the difference between the two statements
1. "The latency should be < 100ms"
2. "The 99th percentile latency should be < 100ms"
They can both be perfectly valid requirements, but which one applies to you depends on what your actual business requirements are. Are you a lifeline service, with no margin for error? Or are you a video feed where you're happy if 99% of your customers are happy?
Negative impacts can seem harder to measure, but that is because we tend to think of them as "bugs that escaped", and not "metrics around features". Consider the difference between the two statements
1. "The latency should be < 100ms"
2. "The 99th percentile latency should be < 100ms"
They can both be perfectly valid requirements, but which one applies to you depends on what your actual business requirements are. Are you a lifeline service, with no margin for error? Or are you a video feed where you're happy if 99% of your customers are happy?
The thought process that goes into the above requirements need to be part of your product lifecycle, and your tests need to be sufficiently comprehensive so that they can identify changes that negatively impact these requirements. Do this well, and your tests will adequately reflect business risk (••). Do it poorly, and even though your tests pass, you'll not actually have much insight into the viability of your releases.
(•) For a given definition of the term "business". Let's just leave it at that.
(••) Incidentally, the technical term for this is Continuous Testing. From Wiki - Continuous testing is the process of executing automated tests as part of the software delivery pipeline to obtain immediate feedback on the business risks associated with a software release candidate
Comments