How should I test all potential combinations of user actions?

Question

For complex software, there are infinite combinations of actions that the user can do in a certain order. What are the leading approaches to testing that any of those actions don't break the software?My initial idea is to come up with a set of all user actions or inputs that are available and then simulate those events happening in various orders to see if things happen as intended or break.Would welcome any thoughts or experience doing this.

TheHideout · Accepted Answer

I would recommend looking at Design of Experiments methodology using Pairwise or Fractional Factorial methods combined with some form of automation.https://csrc.nist.gov/projects/automated-combinatorial-testi...

madhadron · Answer

There are a few things to know about this.
First, the framework you probably want to think about this is Parnas's Trace Assertion Method. It gives you the tools to think about the problem without getting caught in particular details.
There was a paper that looked at serious failures in distributed systems, and found that most of them were caused by a small number of factors. So you get a lot of power without going into all possible sequences. I am completely failing to find it, but it's well known so hopefully someone whose memory is doing better than mine can post it. Or if I remember, I'll come back and add a comment.
I like to think of it as separating the input space into boundaries and bulk (I've described it further here: https://madhadron.com/posts/easier_faster_testing.html ). No actions, empty lists, lists with one element, two actions of different kinds --- these are all the boundary of the input space. For the kind of systems we write as programmers, behavior tends to be fairly uniform once you're into the bulk, so you sample it and then closely test the boundary where stuff changes a lot. Now, you may have boundaries in the middle of the bulk as well (and in numerical work, the boundaries can move around a lot if you don't have a good mathematical understanding of the system), but it is still usually possible to test the various regions of bulk and get good coverage on the boundaries.
QuickCheck and other property based testing libraries have been used for this. The Erlang QuickCheck library was used for testing phone handsets by throwing random event streams at them until they crashed. It's a very powerful tool for this purpose.

mywittyname · Answer

You certainly could write a procedural-generated test harness, where you generate a graph of user actions, then send them to the system. I've done that before for integration testing of user-generated event pathways. But my main concern was for accuracy and load testing, so at the end of the test, I would query the database and check the report against what the test case sent.
The problem with this approach is that it results in "wobbly" tests, that is, tests that fail non-deterministically. These are really annoying and will quickly cause developers to ignore test failures. So I don't like them as part of a standard testing suite.
It sounds like you may want to put more boundaries in your application. Rather than testing out all possible combinations, reduce the available combinations and test their enforcement. I think you're best starting point is to place hard limits on input and test up to those limits. For example, define the maximum length of inputs, allowable character sets, numeric ranges, etc, then write test cases to ensure those limitations are enforced.
It's difficult to know what the next step would be without knowing your tech stack or your goal. Why do you want test various permutations of user input? Is there some cause for concern if a user performs Action B, then Action A instead of the predefined Action A, then Action B? If that's the case, you should encode these paths into your software and have test cases to check them.
It sounds to me like you're looking for a magic bullet to get comprehensive testing without writing comprehensive tests. That's not really possible. You have to test what's important, then rely on logical implication for the rest. Sprinkle in good logging and error handling for the inevitable hiccup, and monitor the logs for improvements.

subjectsigma · Answer

Have you looked into model checking with TLA+? It allows you to exhaustively check that assertions are not violated by an abstract model.
This won't help you test UI code directly but it's a great way to make sure your system's core concepts works.
https://www.hillelwayne.com/post/tla-messages/ is a good article for getting a sense of what TLA+ is about.

zerego · Answer

What you're looking for is called model based testing.
There is also property based testing which generates random data for the tests.
You can read more about this in the official documentation of this npm package https://www.npmjs.com/package/fast-check
I know there are plugins for cypress and most likely to other tools to be able to setup model based tests.

nikivi · Answer

If it's a website, I'd look into https://www.cypress.io or https://github.com/microsoft/playwright

bnj · Answer

Similar to a sibling comment this sounds like pairwise / combinatorial testing. I learned about it by working through the examples and descriptions here:https://www.testcover.com/

chrisMyzel · Answer

EDIT: sorry this is the wrong link =) https://netflix.github.io/chaosmonkey/Theres a similar chaos tool that simulates random user input

Arete314159 · Answer

Have you considered hiring a QA and/or SDET? This kind of question is....literally their job.

dyeje · Answer

_Why_ do you need to test all potential combinations? Just test the critical paths, setup error tracking, and handle the longtail of bugs as they come up. Unless you're building a moon rover this level of granularity is unnecessary.

gls2ro · Answer

I think there are already great answers here and very good resources.
What I would add to this is maybe explaining more how I think about what strategy to choose.
When thinking about infinite combinations, it is important to acknowledge that exhaustive testing (testing all combinations of input and system states) is impossible or at least not feasible.
If we accept this truth, then we should focus on what kind of testing would produce the biggest impact: i.e., help discover the most significant risks, where risk = probability of happening x impact on user/system.
In this case, the focus should be: how to identify the essential scenarios (series of steps/actions) the users are doing?
Two significant ways to approach this question:
1. Having a decision framework or model to pick these essential scenarios
2. Having a stochastic model reduced to a feasible number of cases based on the distribution of causes of errors per number of conditions happening at the same time
--
Let's talk about the second one as I think this is what you are looking for: having a stochastic model to generate combinations of actions a user can do, leading to a bug.
Here there are two things which can be easily applied:
1. Not every combination of parameters contributes the same way in the distribution of causes of bugs i.e., most of the bugs are generated by one condition happening, then when two conditions are happening at the same time, and fewer bugs are generated by 4 or 5 or 6 or 7 conditions happening at the same time.
2. We need a method to generate all combinations of two conditions/params. Like for example, in the case of a web app, we have: feature, browser version, OS version, installed extensions ... In case we want to test all combinations, then that is the cartesian product of these sets: features, browser versions, OS versions ... Which can be very big. So a feasible approach is to test all unique combinations of any two sets. Like browser with feature and feature with extensions and browser with extensions ....
This has a name: Pairwise Testing, where we focus on creating tests to execute all discrete combinations of pairs.
Applying this to your case, as far as I understand it from your question, let's say a user can do a range of actions A1, A2, A3 ... An in various chains. The first step would be to group these actions in different categories and then apply Pairwise Testing between these categories sets. Of course, this is a general advice not knowing your exact app and what those actions are.
I think this is a good candidate for automated testing, and it should be chosen when there is little knowledge about the business domain (or the domain is complex) and we cannot know user behaviors or needs.
The main problem with this approach is that it will not find the most important bugs first.
--
Now let's analyse the other strategy: having a decision framework or model to pick these essential scenarios.
Why I think one should consider this is because I think also testing should have as focus the user and what the user is trying to achieve by using the software we are producing.
This strategy means a series of decisions about what to focus on and what to let go like what kind of risk the business is willing to accept:
1. First, we have to define who the users we focus on are? Most of the time, we cannot focus on all users so we have to pick a group of them. This will be the first decision: what group of users should be our primary focus.
2. We need to decide the most valuable things (features, group of features) our product provides to these users. A second decision being about what, let's say, the core value our product brings to this group of users.
3. Then we need to prioritize these values for our users and based on this decide what kind of risks do we want to cover first with our testing.
After taking these three decisions, we should have a list of scenarios (chained steps) our users are doing in our product which should be as close as possible of what the users are really trying to achieve.
So the testing should focus on:
1. Achieving 100% execution coverage on these scenarios
2. Generate steps mutations to cover extra cases. How big this set will depend on how much time is available.
A simple heuristic for generating these mutations for scenarios is to see the scenario as a chain of steps and apply one of the following techniques (but make sure you keep the last step in place):
1. Randomly remove a step and trying to access the next one. Like from steps A, B, C, D, E, F, delete B and try to go from A to C. Example: in e-commerce, try to GET Index (A) and then GET Shopping cart (C) without going through ADD Product to Cart (B)
2. Randomly replace a step from the chain with one form outside. For example, during a checkout scenario, add a GET to request a password.
3. Inverse the order of any two pairs. Like from scenario A,B,C,D,E,F generate: B,A,C,D,E,F ; A,C,B,D,E,F ; A,B,D,C,E,F ...
4. Duplicate a step. Like from scenario ABCDEF generate: AABCDEF, ABBCDEF ...
Why let the last step in the place? Because it is easy to keep the assert focused on that as that is probably where the user receives the value of their time investment in using the product.
--
Of course what I described above ignores the state of the system and possible side effects from one state to another, which are important and should be taken into consideration, but I think this is not what you are looking to cover with testing.
I am not sure I managed to explain well my approach to testing in this case. I really hope this year I will have more time to write about this subject and learn to express more concise and clear.

How should I test all potential combinations of user actions?

I would recommend looking at Design of Experiments methodology using Pairwise or Fractional Factorial methods combined with some form of automation.
https://csrc.nist.gov/projects/automated-combinatorial-testi...

If it's a website, I'd look into https://www.cypress.io or https://github.com/microsoft/playwright

Similar to a sibling comment this sounds like pairwise / combinatorial testing. I learned about it by working through the examples and descriptions here:
https://www.testcover.com/

EDIT: sorry this is the wrong link =) https://netflix.github.io/chaosmonkey/
Theres a similar chaos tool that simulates random user input

Have you considered hiring a QA and/or SDET? This kind of question is....literally their job.

_Why_ do you need to test all potential combinations? Just test the critical paths, setup error tracking, and handle the longtail of bugs as they come up. Unless you're building a moon rover this level of granularity is unnecessary.