HACKER Q&A
📣 quacked

How can I help my devs solve “combinatorial test case scope creep”?


I work in between clients and programmers; my job is essentially to decompose client requirements into design documentation to be built by developers. [1]

Since I started the job, I've been running into what I'm calling "combinatorial test case scope creep". For instance:

> CHALLENGE: RENDER A CUSTOM HTML SCREEN WITH THREE LABELLED BUTTONS AND AN IMAGE.

> Browsers: {Chrome, Firefox, Edge, Safari}

> Devices: {iPad 9th Gen, Google Pixel 4, iPhone XR, Macbook Pro, HP EliteBook}

> Languages: {English, Spanish, French}

In this case, the custom screen must render properly in 4 Browsers x 4 devices x 3 languages = 48 individual test cases. And that's ignoring all of the other possibilities (landscape/portrait, OS version, browser version, etc.) We can quickly find ourselves in 2000+ possible renderings of the screen.

I, not being a developer myself, have no idea how this is handled on the implementation end. The clients themselves are only dimly aware of this problem and will throw a fit if they notice the screen is broken, but are certainly not testing 50-2000 test cases to see when the screen breaks.

I'd like to help the devs out by steering clients in the right direction, and also adding build instructions to solve this problem at the root. Are there any development principles, testing tactics, historical writings, past HN posts, zen koans, ancient scrolls, etc. that outline and explain solutions to "combinatorial test case scope creep"? Thanks in advance.

[1] Disclaimers: I agree that in many cases it would make more sense for developers to work directly with clients and eliminate my job altogether. I was hired because I have a significant technical background but am not quite at the level of a junior dev. I would set my own team up differently if I owned the company.


  👤 lucas_membrane Accepted Answer ✓
There is a sort-of rule of thumb that about 85% of defects will show up if you test all pairs of the factors. You can almost certainly test all pairs in your case if you test all combinations of browsers with devices (20 cases) and assign the languages so that each language is tested at least once on each browser and each device. Check out the math for 'orthogonal arrays' if you want to see what is known about how to cover all pairs of factors with a minimum number of test cases. If you have four or more dimensions in your space of test cases and want to cover all combinations of three factors, see 'orthogonal arrays of strength 3.'

Unfortunately, 85% is less than 100%. And testing correctness of the system in your case probably requires a set of human eyes looking at the screen, and it may also require testing all kinds of scenarios of changing the window size, and user pressing the tab keys, etc, to make sure that there is no browser logic conflicting with your display logic, etc, is probably a good idea, too.

For applications critical for health, safety, gigabucks or survival of the planet, 85% allows a little too much room for catastrophe, and the common solution is bondage and discipline -- vendor supplied hardware with all software and config pre-installed, user hardware configuration protected from user caprice, etc.

For ordinary internet applications in the workaday world, damn-the-torpedoes is the typical workaday response. For example, I am one of the users who is a little more likely to experience the occasional lack of full functionality in UI settings -- I have eyes that don't like tiny fonts, so I set my font size considerably larger than average and my screen resolution smaller (that buying a screen with high resolution makes things harder to read always annoys me). Plenty of apps and pages on the internet fail for me. Either the user (1 person at a time) or the development org (large team?) has to get along by going along. Your case may be one of the regrettable situations in which 'going along' means fixing the rare problem that you miss that also actually impacts an actual user, but there may be some pretty high likelihood that the fix for the rare case is likelihood to cause other breakages. Good luck.


👤 captainbland
It depends on your risk appetite. You can test exhaustively, and that's safe but probably slow.

I think a lot of shops employ some kind of heuristics to figure out what combinations to focus on e.g. "most of my most serious rendering problems happen with language/font combinations that usually take up a lot of space and use non-latin characters at smaller screen sizes" and so might lead someone to test those combinations more than ones that are seen as less risky.

Totally made up example, or course, you need to build up these kinds of insights from experience (or buy them from someone - but not me, I'm not a frontend guy).

You could potentially test 'diagonal' slices of your test cases - i.e. you might not test every combination of {Edge, Firefox, Chrome} and {English, Chinese, Spanish} - so your test combinations might be something like {{Edge, English}, {Firefox, Chinese}, {Chrome, Spanish}} - then you don't test every combination, but you get a kind of cross-section that tests every one of your test arguments in at least one scenario. Where there are many parameters and dimensions, you can test a few of these 'diagonals' by cutting the test data in different ways to get pretty-good coverage without suffering so much from combinatorial explosion, and pick carefully which dimension combinations you wish to create test scenarios out of.


👤 karmakaze
Any time there's a combinatorical number, you haven't factored the problem. The custom screen has 4 things: 3 buttons and an image.

You have two kinds of things: button and image.

The button can have language localization: English, Spanish, French

Each of the above things should be tested for what makes it different from other similar things. Not everything should need to be tested in all ways, all the way down.

Each thing should as much as possible validate its possible inputs. E.g. if you use any static type information, it would be impossible to pass in a missing value without an error before even trying to run anything.

The various devices you would need to test. But then again, each part (button or image layout and button text) could be tested first on their own. Building high quality parts that you can trust to do the right thing pay for themselves over time and then some.

If this doesn't make sense to you, perhaps you should invest in better frameworks, languages, or tooling. Otherwise follow the advice of other comments to try to effectively limit your effort.


👤 Blackthorn
Dev here. How I'd personally approach this is to try to figure out which tests have marginal value given what's already been run. So if I've tested {chrome, ipad9, English} and {Firefox, pixel 4, Spanish}, there's the question of whether {Chrome, ipad9, Spanish} would really add any value here.

👤 gashmol
Generally speaking, you can reduce the combinatorics in tests by using techniques like cause effect graphing (from The Art of Software Testing). It also contains notation for cases that "can't happen". It's very technical so I'm not sure if it will solve your problem.

👤 sidpatil
You might find the approach used in UnitTestDesign.jl [1] to be useful. If I understand it correctly, the package tests combinations where more than one parameter changes, since it assumes this is when breakage is more likely to occur, versus the exhaustive testing you're trying to avoid.

Disclaimer: I haven't actually tried this approach yet.

[1] https://github.com/adolgert/UnitTestDesign.jl


👤 giantg2
I hate testing. Is it possible to write the cases just without those system properties then run them through AWS Device Farm somehow?

👤 Raed667
Why do you need to test against multiple languages? Don't you trust the review process for when you add i18n translations?

Why do you test Chrome and Edge? Don't you trust that they're basically the same browser?

Why do you test against both a Macbook and an HP? Don't you trust you can render buttons and image similarly on both platforms ?


👤 bloqs
Hey quacked, what sort of job role do you have? Sounds cool.