A while back, I was doing some lectures on advanced software testing technologies. One topic was combinatorial testing. Looking at the materials, there are good and free tools out there to generate tests to cover various combinations. Still, I don’t see many people use them, and the materials out there don’t seem too great.
Combinatorial testing here refers to having 2-way, 3-way, up to N-way (sometimes they seem to call it t-way…) combinations of data values in different test cases. 2-way is also called pairwise testing. This simply refers to all pairs of data values appearing in different test cases. For example, if one test uses values “A” and “B”, and another uses a combination of “A” and “C”, you would have covered the pairs A+B and A+C but not B+C. With large numbers of potential values, the set of potential combinations can grow pretty huge, so finding a minimal set to cover all combinations can be very useful.
The benefits
There is a nice graph over at NIST, including a PDF with a broader description. Basically these show that 2-way and 3-way combinations already show very high gains in finding defects over considering coverage of single variables alone. Of course, things get a bit more complicated when you need to find all relevant variables in the program control flow, how to define what you can combine, all the constraints, etc. Maybe later. Now I just wanted to try the combinatorial test generation.
Do Not. Try. Bad Yoda Joke. Do Try.
So I gave combinatorial test generation a go. Using a nice and freely available PICT tool from Microsoft Research. It even compiles on different platforms, not just Windows. Or so they say on their Github.
Unexpectedly, compiling and getting PICT to run on my OSX was quite simple. Just “make” and “make test” as suggested on the main Github page. Probably I had most dependencies already from before, but anyway, it was surprisingly easy.
I made “mymodels” and “myoutputs” directories under the directory I cloned the git and compile the code to. Just so I could keep some order to my stuffs. So this is why the following example commands work..
I started with the first example on PICT documentation page. The model looks like this:
Type: Primary, Logical, Single, Span, Stripe, Mirror, RAID-5 Size: 10, 100, 500, 1000, 5000, 10000, 40000 Format method: quick, slow File system: FAT, FAT32, NTFS Cluster size: 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536 Compression: on, off
Running the tool and getting some output is actually simpler than I expected:
./pict mymodels/example1.pict >myoutputs/example1.txt
PICT prints the list of generated test value combinations to the standard output. Which generally just translates to printing a bunch of lines on the console/screen. To save the generated values, I just pipe the output to myoutputs/example1.txt, as shown above. In this case, the output looks like this:
Type Size Format method File system Cluster size Compression Stripe 100 quick FAT32 1024 on Logical 10000 slow NTFS 512 off Primary 500 quick FAT 65536 off Span 10000 slow FAT 16384 on Logical 40000 quick FAT32 16384 off Span 1000 quick NTFS 512 on Span 10 slow FAT32 32768 off Stripe 5000 slow NTFS 32768 on RAID-5 500 slow FAT 32768 on Mirror 1000 quick FAT 32768 off Single 10 quick NTFS 4096 on RAID-5 100 slow FAT32 4096 off Mirror 100 slow NTFS 65536 on RAID-5 40000 quick NTFS 2048 on Stripe 5000 quick FAT 4096 off Primary 40000 slow FAT 8192 on Mirror 10 quick FAT32 8192 off Span 500 slow FAT 1024 off Single 1000 slow FAT32 2048 off Stripe 500 quick NTFS 16384 on Logical 10 quick FAT 2048 on Stripe 10000 quick FAT32 512 off Mirror 500 quick FAT32 2048 on Primary 10 slow FAT32 16384 on Single 10 quick FAT 512 off Single 10000 quick FAT32 65536 off Primary 40000 quick NTFS 32768 on Single 100 quick FAT 8192 on Span 5000 slow FAT32 2048 on Single 5000 quick NTFS 16384 off Logical 500 quick NTFS 8192 off RAID-5 5000 quick NTFS 1024 on Primary 1000 slow FAT 1024 on RAID-5 10000 slow NTFS 8192 on Logical 100 quick NTFS 32768 off Primary 10000 slow FAT 32768 on Stripe 40000 quick FAT32 65536 on Span 40000 quick FAT 4096 on Stripe 1000 quick FAT 8192 off Logical 1000 slow FAT 4096 off Primary 100 quick FAT 2048 off Single 40000 quick FAT 1024 off RAID-5 1000 quick FAT 16384 on Single 500 quick FAT32 512 off Stripe 10 quick NTFS 2048 off Primary 100 quick NTFS 512 off Logical 10000 slow NTFS 1024 off Mirror 5000 quick FAT 512 on Logical 5000 slow NTFS 65536 off Mirror 10000 slow FAT 2048 off RAID-5 10 slow FAT32 65536 off Span 100 quick FAT 65536 on Single 5000 quick FAT 32768 on Span 1000 quick NTFS 65536 off Primary 500 slow FAT32 4096 off Mirror 40000 slow FAT32 4096 off Mirror 10 slow FAT32 1024 off Logical 10000 quick FAT 4096 off Span 5000 slow FAT 8192 off RAID-5 40000 quick FAT32 512 on Primary 5000 quick NTFS 1024 off Mirror 100 slow FAT32 16384 off
The first line is the header, and values/columns are separated by tabulator characters (tabs).
The output above is 62 generated combinations/test cases as evidenced by:
wc -l myoutputs/example1.txt 63 myoutputs/example1.txt
(wc-l counts lines, and the first line is the header so I substract 1)
To produce all 3-way combinations with PICT, the syntax is:
./pict mymodels/example1.pict >myoutputs/example1.txt /o:3
which generates 392 combinations/test cases:
wc -l myoutputs/example1.txt 393 myoutputs/example1.txt
I find the PICT command-line syntax a bit odd, as parameters have to be the last elements on the line, and they are identified by these strange symbols like “/o:”. But it works, so great.
Constraints
Of course, not all combinations are always valid. So PICT has extensive support to define constraints on the generator model, to limit what kind of combinations PICT generates. The PICT documentation page has lots of good examples. This part actually seems nicely documented. But let’s try a few just to see what happens. The basic example from the page:
Type: Primary, Logical, Single, Span, Stripe, Mirror, RAID-5 Size: 10, 100, 500, 1000, 5000, 10000, 40000 Format method: quick, slow File system: FAT, FAT32, NTFS Cluster size: 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536 Compression: on, off IF [File system] = "FAT" THEN [Size] <= 4096; IF [File system] = "FAT32" THEN [Size] myoutputs/example2.txt wc -l myoutputs/example2.txt 63 myoutputs/example2.txt
So the same number of tests. The contents:
Type Size Format method File system Cluster size Compression Stripe 500 slow NTFS 1024 on Primary 500 quick FAT32 512 off Single 10 slow FAT 1024 off Single 5000 quick FAT32 32768 on Span 40000 quick NTFS 16384 off Mirror 40000 slow NTFS 512 on RAID-5 100 quick FAT 8192 on Logical 500 slow FAT 2048 off Span 10000 slow FAT32 1024 on Logical 1000 slow FAT32 16384 on Span 1000 quick FAT 512 off Primary 10 quick NTFS 1024 on Mirror 1000 quick NTFS 4096 off RAID-5 40000 slow NTFS 1024 off Single 40000 slow NTFS 8192 off Stripe 10 slow FAT32 4096 on Stripe 40000 quick NTFS 2048 on Primary 100 slow NTFS 32768 off Stripe 500 quick FAT 16384 off RAID-5 1000 quick FAT32 2048 off Mirror 10 quick FAT 65536 off Logical 40000 quick NTFS 4096 on RAID-5 5000 slow NTFS 512 off Stripe 5000 slow FAT32 65536 on Span 10 quick FAT32 2048 off Logical 10000 quick NTFS 65536 off Primary 1000 slow FAT 65536 off Mirror 500 quick FAT 32768 on Single 100 quick FAT32 512 on Mirror 5000 slow FAT32 2048 on Mirror 100 quick NTFS 2048 on Logical 5000 quick FAT32 8192 off Logical 100 slow FAT32 1024 on Primary 100 quick FAT32 16384 off Primary 10000 quick FAT32 2048 on RAID-5 10 slow FAT 32768 off Mirror 10 quick FAT 16384 on Single 500 slow FAT 4096 on Span 500 slow FAT32 8192 on Stripe 10000 quick FAT32 32768 off Logical 1000 slow NTFS 32768 on Single 10000 slow NTFS 16384 off Span 100 slow FAT32 4096 on Stripe 1000 slow NTFS 8192 on Span 5000 quick NTFS 32768 on Primary 5000 slow FAT32 4096 off RAID-5 100 slow FAT 65536 off RAID-5 10000 slow FAT32 4096 on Single 1000 quick FAT 1024 on Mirror 10 quick FAT 1024 on Logical 5000 slow FAT32 1024 off Single 500 slow FAT32 65536 off Logical 10 quick NTFS 512 on Single 1000 slow FAT 2048 off Mirror 10000 quick NTFS 8192 on Primary 10 quick FAT32 8192 on Primary 40000 slow NTFS 32768 off Stripe 100 slow FAT 512 off Mirror 10000 slow FAT32 512 on RAID-5 5000 quick NTFS 16384 off Span 40000 quick NTFS 65536 on RAID-5 500 quick FAT 4096 on
In the “size” column vs the “File system” column, the “FAT” file system type now always has a size smaller than 4096. So it works as expected. I have to admit, I found the value 4096 very confusing here, since there is no option of 4096 in the input model for “size” but there is for “Cluster size”. I was looking at the wrong column initially, wondering why the constraint was not working. But it works, just a bit confusing example.
Similarly, 3-way combinations produce the same number of tests (as it did without any constraints) even with these constraints:
./pict mymodels/example2.pict >myoutputs/example2.txt /o:3 wc -l myoutputs/example2.txt 393 myoutputs/example2.txt
To experiment a bit more, I set a limit on FAT size to be 100 or less:
Type: Primary, Logical, Single, Span, Stripe, Mirror, RAID-5 Size: 10, 100, 500, 1000, 5000, 10000, 40000 Format method: quick, slow File system: FAT, FAT32, NTFS Cluster size: 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536 Compression: on, off IF [File system] = "FAT" THEN [Size] <= 100; IF [File system] = "FAT32" THEN [Size] myoutputs/example3.txt wc -l myoutputs/example3.txt 62 myoutputs/example3.txt ./pict mymodels/example3.pict >myoutputs/example3.txt /o:3 wc -l myoutputs/example3.txt 397 myoutputs/example3.txt
What happened here?
Running the 2-way generator produces 61 tests. So the number of combinations generated was finally reduced by one with the additional constraint.
Running the 3-way generator produces 396 tests. So the number of tests/combinations generated was increased by 4, comparated to 3-way generator without this constraint. Which is odd, as I would expect the number of tests to go down, when there are fewer options. In fact, you could get a smaller number of tests by just by taking the 392 tests from the previous generator run with fewer constraints. Then take every line with “FAT” for “File system”, and if the “Size” for those is bigger than 100, replace it with either 100 or 10. This would be a max of 392 as it was before.
My guess is this is because building the set of inputs to cover all requested combinations is a very hard problem. I believe in computer science this would be called an NP-hard problem (or so I gather from the academic literature for combinatorial testing, even if they seem to call the test set a “covering array” and other academic tricks). So no solution is known that would produce the optimal result. The generator will then have to accomodate all the possible constraints in its code, and ends up taking some tricks here that result in slighly bigger set. It is still likely a very nicely optimized set. Glad it’s not me having to write those algorithms :). I just use them and complain :).
PICT has a bunch of other ways to define conditional constraints with the use of IF, THEN, ELSE, NOT, OR, AND statements. The docs cover that nicely. So lets not go there.
The Naming Trick
Something I found interesting is a way to build models by naming different items separately, and constraining them separately:
# # Machine 1 # OS_1: Win7, Win8, Win10 SKU_1: Home, Pro LANG_1: English, Spanish, Chinese # # Machine 2 # OS_2: Win7, Win8, Win10 SKU_2: Home, Pro LANG_2: English, Spanish, Chinese, Hindi IF [LANG_1] = [LANG_2] THEN [OS_1] <> [OS_2] AND [SKU_1] <> [SKU_2];
Here we have two items (“machines”) with the same three properties (“OS”, “SKU”, “LANG”). However, by numbering the properties, the generator sees them as different. From this, the generator can now build combinations of different two-machine configurations, using just the basic syntax and no need to tweak the generator itself. The only difference between the two is that “Machine 2” can have one additional language (“Hindi”).
The constraint at the end also nicely ensures that if the generated configurations have the same language, the OS and SKU should be different.
Scaling these “machine” combinations to a large number of machines would require a different type of an approach. Since it is doubtful anyone would like to write a model with 100 machines, each separately labeled. No idea what modelling approach would be the best for that, but right now I don’t have a big requirement for it, so not going there. Maybe a different approach of having the generator produce a more abstract set of combinations, and map those to large number of “machines” somehow.
Repetition and Value References
There is quite a bit of repetition in the above model with both machines repeating all the same parameter values. PICT has a way to address this by referencing values defined for other parameters:
# # Machine 1 # OS_1: Win7, Win8, Win10 SKU_1: Home, Pro LANG_1: English, Spanish, Chinese # # Machine 2 # OS_2: <OS_1> SKU_2: <SKU_1> LANG_2: <LANG_1>, Hindi
So in this case, “machine 2” is repeating the values from “machine 1”, and changing them in “machine 1” also changes them in “machine 2”. Sometimes that is good, other times maybe not. Because changing one thing would change many, and you might not remember that every time. On the other hand, you would not want to be manually updating all items with the same info every time. But a nice feature to have if you need it.
Data Types
With regards to variable types, PICT supports numbers and strings. So this is given as an example model:
Size: 1, 2, 3, 4, 5 Value: a, b, c, d IF [Size] > 3 THEN [Value] > "b";
I guess the two types are because you can then define different types of constraints on them. For example, “Size” > 3 makes sense. The part of “value” > 3 a bit less.. So let’s try that:
./pict mymodels/example4.pict >myoutputs/example4.txt wc -l myoutputs/example4.txt 17 myoutputs/example4.txt
The output looks like this:
Size Value 3 a 2 c 1 c 2 b 2 a 1 d 1 a 3 b 4 d 2 d 3 d 1 b 5 c 3 c 4 c 5 d
And here, if “Size” equals 4 or 5 (so is >3), “Value” is always “c” or d”. The PICT docs state “String comparison is lexicographical and case-insensitive by default”. So [> “b”] just refers to letters coming after “b”, which equals “c” and “d” in the choices in this model. It seems a bit odd to define such comparisons against text in a model, but I guess it can help make a model more readable if you can represent values as numbers or strings, and define constraints on them in a similar way.
To verify, I try a slightly modified model:
./pict mymodels/example4.pict >myoutputs/example4.txt wc -l myoutputs/example4.txt 13 myoutputs/example4.txt
So, the number of tests is reduced from 16 to 12. Results in the following output:
Size Value 5 c 2 c 1 d 4 d 1 b 4 c 3 d 3 c 2 d 1 c 1 a 5 d
Which confirms that lines (tests) with Size > 2 now have only letters “c” or “d” in them. This naturally also limits the number of available combinations, hence the reduced test set.
Extra Features
There are some nice features that are nicely explained in the PICT docs:
- Submodels: Refers to defining levels of combinations per test. For example, 2-way combinations of OS with all others, and 3-way combination of File System Type with all others, at the same time.
- Aliasing: You can give the same parameter several names and all are treated the same. Not sure why you want to do that but anyway.
- Weighting: Since the full set of combinations will have more of some values anyway, this can be used to set preference for specific ones.‘
Negative Testing / Erronous Values
A few more interesting ones are “negative testing” and “seeding”. So first negative testing. Negative testing refers to having a set of exclusive values. So those values should never appear together. This is because each of them is expected to produce an error. So you want to make sure the error they produce is visible and not “masked” (hidden) by some other erronous value.
The example model from PICT docs, with a small modification to name the invalid values differently:
# # Trivial model for SumSquareRoots # A: ~-1, 0, 1, 2 B: ~-2, 0, 1, 2
Running it, we get:
./pict mymodels/example5.pict >myoutputs/example5.txt wc -l myoutputs/example5.txt 16 myoutputs/example5.txt
A B 0 2 0 1 1 2 2 1 1 0 2 0 1 1 2 2 0 0 0 ~-2 1 ~-2 ~-1 0 ~-1 1 2 ~-2 ~-1 2
The negative value is prefixed with “~”, and the results show combinations of the two negative values with all possible values of the other variable. So if A is -1, it is combined with 0, 1, 2 for B. If B is -2 it is combinted with 0, 1, 2 for A. But -1 and -2 are never paired. To avoid one “faulty” variable masking the other one. I find having the “~” added everywhere a bit distracting. But I guess you could parse around it, not a real issue.
Of course, there is nothing to stop us from setting the set of possible values to include -1 and -2, and get combinations of several “negative” values. Lets try:
A: -1, 0, 1, 2 B: -2, 0, 1, 2
./pict mymodels/example6.pict >myoutputs/example6.txt wc -l myoutputs/example6.txt 17 myoutputs/example6.txt
A B 1 -2 2 0 1 0 -1 0 0 -2 2 1 -1 -2 0 0 1 2 -1 2 0 2 2 -2 1 1 -1 1 0 1 2 2
So there we go. This produced one test more than the previous one. And that would be the one where both the negatives are present. Line with “-1” and “-2” together.
Overall, the “~” notation seems like just a way to avoid having a set of variables appear together. Convenient, and good way to optimize more when you have large models, big input spaces, slow tests, difficult problem reports, etc.
Seeding / Forcing Tests In
Seeding. When I hear seeding in test generation, I think about the seed value for a random number generator. Because often those are used to help generate tests.. Well, with PICT it actually means you can predine a set of combinations that need to be a part of the final test set.
So lets try with the first example model from above:
Type: Primary, Logical, Single, Span, Stripe, Mirror, RAID-5 Size: 10, 100, 500, 1000, 5000, 10000, 40000 Format method: quick, slow File system: FAT, FAT32, NTFS Cluster size: 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536 Compression: on, off
The seed files should be the same format as the output produced by PICT. Lets say I want to try all types with all file systems, using smallest size. So I try with this:
Type Size Format method File system Cluster size Compression Primary 10 FAT32 on Logical 10 FAT32 on Single 10 FAT32 on Span 10 FAT32 on Stripe 10 FAT32 on Mirror 10 FAT32 on RAID-5 10 FAT32 on Primary 10 FAT on Logical 10 FAT on Single 10 FAT on Span 10 FAT on Stripe 10 FAT on Mirror 10 FAT on RAID-5 10 FAT on Primary 10 NTFS on Logical 10 NTFS on Single 10 NTFS on Span 10 NTFS on Stripe 10 NTFS on Mirror 10 NTFS on RAID-5 10 NTFS on
To run it:
./pict mymodels/example7.pict /e:mymodels/example7.seed >myoutputs/example7.txt wc -l myoutputs/example7.txt 73 myoutputs/example7.txt
So in the beginning of this post, the initial model generated 62 combinations. With this seed file, some forced repetition is there and the size goes up to 72. Still not that much bigger, but I guess shows something about how nice it is to have a combinatorial test tool to optimize this type of test set for you.
The actual output:
Type Size Format method File system Cluster size Compression Primary 10 quick FAT32 2048 on Logical 10 slow FAT32 16384 on Single 10 slow FAT32 65536 on Span 10 quick FAT32 1024 on Stripe 10 quick FAT32 8192 on Mirror 10 quick FAT32 512 on RAID-5 10 slow FAT32 32768 on Primary 10 slow FAT 4096 on Logical 10 quick FAT 1024 on Single 10 quick FAT 32768 on Span 10 slow FAT 512 on Stripe 10 slow FAT 16384 on Mirror 10 slow FAT 8192 on RAID-5 10 slow FAT 2048 on Primary 10 quick NTFS 65536 on Logical 10 quick NTFS 4096 on Single 10 slow NTFS 16384 on Span 10 quick NTFS 32768 on Stripe 10 slow NTFS 1024 on Mirror 10 slow NTFS 2048 on RAID-5 10 quick NTFS 512 on Span 40000 slow FAT 65536 off Single 5000 quick NTFS 8192 off Mirror 1000 quick FAT32 4096 off Stripe 100 slow FAT 32768 off Primary 500 slow FAT 512 off Primary 40000 quick NTFS 8192 on Logical 10000 quick NTFS 32768 off RAID-5 40000 slow FAT32 1024 off Span 100 quick NTFS 8192 on Mirror 10000 slow FAT32 16384 off Logical 5000 slow FAT 512 on Primary 1000 slow FAT 1024 on Mirror 5000 quick FAT32 1024 on Logical 1000 quick NTFS 32768 on Single 40000 slow FAT32 512 on Stripe 40000 quick FAT 16384 on Logical 100 quick FAT32 2048 off Single 100 quick FAT32 1024 off Primary 5000 quick NTFS 32768 off Single 40000 slow NTFS 2048 on Logical 500 quick FAT32 8192 on Single 500 slow NTFS 4096 on Span 500 quick FAT32 16384 on Primary 100 quick FAT32 512 off Stripe 1000 slow FAT32 2048 on RAID-5 10000 quick FAT 8192 on Stripe 10000 slow NTFS 512 off Stripe 5000 quick FAT 65536 on Mirror 40000 slow NTFS 32768 on Primary 10000 quick NTFS 1024 on RAID-5 100 quick FAT 16384 off Mirror 500 quick NTFS 1024 on Single 1000 slow FAT32 512 on Span 100 slow FAT32 4096 off Span 5000 slow NTFS 2048 on RAID-5 40000 slow FAT 4096 off Span 1000 slow FAT32 16384 on Mirror 100 quick FAT 65536 on Single 10000 slow FAT 4096 off RAID-5 1000 slow NTFS 65536 off Span 10000 slow NTFS 65536 on Span 1000 slow FAT32 8192 off RAID-5 500 quick NTFS 32768 off Stripe 500 slow FAT 2048 off RAID-5 5000 slow NTFS 16384 on Stripe 5000 slow FAT32 4096 off Logical 10 slow FAT 65536 off RAID-5 10000 quick NTFS 2048 on Primary 1000 slow FAT 16384 off Logical 40000 quick FAT32 8192 on Primary 500 quick FAT 65536 on
This output starts with the seeds given, and PICT has done its best to fill in the blanks with such values as to still minimize the test numbers while meeting the combinatorial coverage requirements.
Personal Thoughts
Searching for PICT and pairwise testing or combinatorial testing brings up a bunch of results and reasonably good articles on the topic. Maybe even more of such practice oriented ones than model-based testing. Maybe because it is simpler to apply, and thus easier to pick up and go in practice?
For example, this has a few good points. One is to use an iterative process to build the input models. So as with everything else, not to expect to get it all perfectly right from the first try. Another is to consider invariants for test oracles. So things that should always hold, such as two nodes in a distributed system never being in a conflicting state when an operation involving both is done. Of course, this would also apply to any other type of testing. The article seems to consider this also from a hierarchical viewpoint, checking the strictest or most critical ones first.
Another good point in that article is to use readable names for the values. I guess sometimes people could use the PICT output as such, to define test configurations and the like for manual testing. I would maybe considering using them more as input for automated test execution to define parameter values to cover. In such cases, it would be enough to give each value a short name such as “A”, “A1”, or “1”. But looking at the model and the output, it would be difficult to define which value would map to which symbol. Readable names are just as parseable for the computer but much more so for the human expert.
Combining with Sequential Models
So this is all nice and shiny, but the examples are actually quite simple test scenarios. There are no complex dependencies between them, not complex state that defines what parameters and values are available, and so on. It mostly seems to vary around what combinations of software or system configurations should be used in testing.
I have worked plenty with model-based testing myself (see OSMO), and actually have talked to some people who have done combinations of combinatorial input generation and model-based testing. I can see how this could be interested, to identify a set of injection points for parameters and values in a MBT model, and use a combinatorial test data generator to build data sets for those injection points. Likely doing some more of this in practice would reveal good insights on what works and what could be done to make the match even better. Maybe someday.
In any case, I am sure combining combinatorial test datasets would also work great with other types of sequences as well. I think this could make a very interesting and practical research topic. Again, maybe someday..
Bye Now
In general, this area seems to have great tools for the basic test generation, but missing some in-depth experiences and guides for how to apply to more complex software. Together with sequential test cases and test generators.
A simpler, yet interesting topic to do would be to integrate the PICT type generator directly with the test environment. Run the combinatorial generator from this during the test runs, and have it randomize the combinations in a bit different ways during different runs. While still maintaining the overall combinatorial coverage.