Since I started the job, I've been running into what I'm calling "combinatorial test case scope creep". For instance:
> CHALLENGE: RENDER A CUSTOM HTML SCREEN WITH THREE LABELLED BUTTONS AND AN IMAGE.
> Browsers: {Chrome, Firefox, Edge, Safari}
> Devices: {iPad 9th Gen, Google Pixel 4, iPhone XR, Macbook Pro, HP EliteBook}
> Languages: {English, Spanish, French}
In this case, the custom screen must render properly in 4 Browsers x 4 devices x 3 languages = 48 individual test cases. And that's ignoring all of the other possibilities (landscape/portrait, OS version, browser version, etc.) We can quickly find ourselves in 2000+ possible renderings of the screen.
I, not being a developer myself, have no idea how this is handled on the implementation end. The clients themselves are only dimly aware of this problem and will throw a fit if they notice the screen is broken, but are certainly not testing 50-2000 test cases to see when the screen breaks.
I'd like to help the devs out by steering clients in the right direction, and also adding build instructions to solve this problem at the root. Are there any development principles, testing tactics, historical writings, past HN posts, zen koans, ancient scrolls, etc. that outline and explain solutions to "combinatorial test case scope creep"? Thanks in advance.
[1] Disclaimers: I agree that in many cases it would make more sense for developers to work directly with clients and eliminate my job altogether. I was hired because I have a significant technical background but am not quite at the level of a junior dev. I would set my own team up differently if I owned the company.
Unfortunately, 85% is less than 100%. And testing correctness of the system in your case probably requires a set of human eyes looking at the screen, and it may also require testing all kinds of scenarios of changing the window size, and user pressing the tab keys, etc, to make sure that there is no browser logic conflicting with your display logic, etc, is probably a good idea, too.
For applications critical for health, safety, gigabucks or survival of the planet, 85% allows a little too much room for catastrophe, and the common solution is bondage and discipline -- vendor supplied hardware with all software and config pre-installed, user hardware configuration protected from user caprice, etc.
For ordinary internet applications in the workaday world, damn-the-torpedoes is the typical workaday response. For example, I am one of the users who is a little more likely to experience the occasional lack of full functionality in UI settings -- I have eyes that don't like tiny fonts, so I set my font size considerably larger than average and my screen resolution smaller (that buying a screen with high resolution makes things harder to read always annoys me). Plenty of apps and pages on the internet fail for me. Either the user (1 person at a time) or the development org (large team?) has to get along by going along. Your case may be one of the regrettable situations in which 'going along' means fixing the rare problem that you miss that also actually impacts an actual user, but there may be some pretty high likelihood that the fix for the rare case is likelihood to cause other breakages. Good luck.
I think a lot of shops employ some kind of heuristics to figure out what combinations to focus on e.g. "most of my most serious rendering problems happen with language/font combinations that usually take up a lot of space and use non-latin characters at smaller screen sizes" and so might lead someone to test those combinations more than ones that are seen as less risky.
Totally made up example, or course, you need to build up these kinds of insights from experience (or buy them from someone - but not me, I'm not a frontend guy).
You could potentially test 'diagonal' slices of your test cases - i.e. you might not test every combination of {Edge, Firefox, Chrome} and {English, Chinese, Spanish} - so your test combinations might be something like {{Edge, English}, {Firefox, Chinese}, {Chrome, Spanish}} - then you don't test every combination, but you get a kind of cross-section that tests every one of your test arguments in at least one scenario. Where there are many parameters and dimensions, you can test a few of these 'diagonals' by cutting the test data in different ways to get pretty-good coverage without suffering so much from combinatorial explosion, and pick carefully which dimension combinations you wish to create test scenarios out of.
You have two kinds of things: button and image.
The button can have language localization: English, Spanish, French
Each of the above things should be tested for what makes it different from other similar things. Not everything should need to be tested in all ways, all the way down.
Each thing should as much as possible validate its possible inputs. E.g. if you use any static type information, it would be impossible to pass in a missing value without an error before even trying to run anything.
The various devices you would need to test. But then again, each part (button or image layout and button text) could be tested first on their own. Building high quality parts that you can trust to do the right thing pay for themselves over time and then some.
If this doesn't make sense to you, perhaps you should invest in better frameworks, languages, or tooling. Otherwise follow the advice of other comments to try to effectively limit your effort.
Disclaimer: I haven't actually tried this approach yet.
Why do you test Chrome and Edge? Don't you trust that they're basically the same browser?
Why do you test against both a Macbook and an HP? Don't you trust you can render buttons and image similarly on both platforms ?