My initial idea is to come up with a set of all user actions or inputs that are available and then simulate those events happening in various orders to see if things happen as intended or break.
Would welcome any thoughts or experience doing this.
https://csrc.nist.gov/projects/automated-combinatorial-testi...
First, the framework you probably want to think about this is Parnas's Trace Assertion Method. It gives you the tools to think about the problem without getting caught in particular details.
There was a paper that looked at serious failures in distributed systems, and found that most of them were caused by a small number of factors. So you get a lot of power without going into all possible sequences. I am completely failing to find it, but it's well known so hopefully someone whose memory is doing better than mine can post it. Or if I remember, I'll come back and add a comment.
I like to think of it as separating the input space into boundaries and bulk (I've described it further here: https://madhadron.com/posts/easier_faster_testing.html ). No actions, empty lists, lists with one element, two actions of different kinds --- these are all the boundary of the input space. For the kind of systems we write as programmers, behavior tends to be fairly uniform once you're into the bulk, so you sample it and then closely test the boundary where stuff changes a lot. Now, you may have boundaries in the middle of the bulk as well (and in numerical work, the boundaries can move around a lot if you don't have a good mathematical understanding of the system), but it is still usually possible to test the various regions of bulk and get good coverage on the boundaries.
QuickCheck and other property based testing libraries have been used for this. The Erlang QuickCheck library was used for testing phone handsets by throwing random event streams at them until they crashed. It's a very powerful tool for this purpose.
The problem with this approach is that it results in "wobbly" tests, that is, tests that fail non-deterministically. These are really annoying and will quickly cause developers to ignore test failures. So I don't like them as part of a standard testing suite.
It sounds like you may want to put more boundaries in your application. Rather than testing out all possible combinations, reduce the available combinations and test their enforcement. I think you're best starting point is to place hard limits on input and test up to those limits. For example, define the maximum length of inputs, allowable character sets, numeric ranges, etc, then write test cases to ensure those limitations are enforced.
It's difficult to know what the next step would be without knowing your tech stack or your goal. Why do you want test various permutations of user input? Is there some cause for concern if a user performs Action B, then Action A instead of the predefined Action A, then Action B? If that's the case, you should encode these paths into your software and have test cases to check them.
It sounds to me like you're looking for a magic bullet to get comprehensive testing without writing comprehensive tests. That's not really possible. You have to test what's important, then rely on logical implication for the rest. Sprinkle in good logging and error handling for the inevitable hiccup, and monitor the logs for improvements.
This won't help you test UI code directly but it's a great way to make sure your system's core concepts works.
https://www.hillelwayne.com/post/tla-messages/ is a good article for getting a sense of what TLA+ is about.
There is also property based testing which generates random data for the tests.
You can read more about this in the official documentation of this npm package https://www.npmjs.com/package/fast-check
I know there are plugins for cypress and most likely to other tools to be able to setup model based tests.
Theres a similar chaos tool that simulates random user input
What I would add to this is maybe explaining more how I think about what strategy to choose.
When thinking about infinite combinations, it is important to acknowledge that exhaustive testing (testing all combinations of input and system states) is impossible or at least not feasible.
If we accept this truth, then we should focus on what kind of testing would produce the biggest impact: i.e., help discover the most significant risks, where risk = probability of happening x impact on user/system.
In this case, the focus should be: how to identify the essential scenarios (series of steps/actions) the users are doing?
Two significant ways to approach this question:
1. Having a decision framework or model to pick these essential scenarios
2. Having a stochastic model reduced to a feasible number of cases based on the distribution of causes of errors per number of conditions happening at the same time
--
Let's talk about the second one as I think this is what you are looking for: having a stochastic model to generate combinations of actions a user can do, leading to a bug.
Here there are two things which can be easily applied:
1. Not every combination of parameters contributes the same way in the distribution of causes of bugs i.e., most of the bugs are generated by one condition happening, then when two conditions are happening at the same time, and fewer bugs are generated by 4 or 5 or 6 or 7 conditions happening at the same time.
2. We need a method to generate all combinations of two conditions/params. Like for example, in the case of a web app, we have: feature, browser version, OS version, installed extensions ... In case we want to test all combinations, then that is the cartesian product of these sets: features, browser versions, OS versions ... Which can be very big. So a feasible approach is to test all unique combinations of any two sets. Like browser with feature and feature with extensions and browser with extensions ....
This has a name: Pairwise Testing, where we focus on creating tests to execute all discrete combinations of pairs.
Applying this to your case, as far as I understand it from your question, let's say a user can do a range of actions A1, A2, A3 ... An in various chains. The first step would be to group these actions in different categories and then apply Pairwise Testing between these categories sets. Of course, this is a general advice not knowing your exact app and what those actions are.
I think this is a good candidate for automated testing, and it should be chosen when there is little knowledge about the business domain (or the domain is complex) and we cannot know user behaviors or needs.
The main problem with this approach is that it will not find the most important bugs first.
--
Now let's analyse the other strategy: having a decision framework or model to pick these essential scenarios.
Why I think one should consider this is because I think also testing should have as focus the user and what the user is trying to achieve by using the software we are producing.
This strategy means a series of decisions about what to focus on and what to let go like what kind of risk the business is willing to accept:
1. First, we have to define who the users we focus on are? Most of the time, we cannot focus on all users so we have to pick a group of them. This will be the first decision: what group of users should be our primary focus.
2. We need to decide the most valuable things (features, group of features) our product provides to these users. A second decision being about what, let's say, the core value our product brings to this group of users.
3. Then we need to prioritize these values for our users and based on this decide what kind of risks do we want to cover first with our testing.
After taking these three decisions, we should have a list of scenarios (chained steps) our users are doing in our product which should be as close as possible of what the users are really trying to achieve.
So the testing should focus on:
1. Achieving 100% execution coverage on these scenarios
2. Generate steps mutations to cover extra cases. How big this set will depend on how much time is available.
A simple heuristic for generating these mutations for scenarios is to see the scenario as a chain of steps and apply one of the following techniques (but make sure you keep the last step in place):
1. Randomly remove a step and trying to access the next one. Like from steps A, B, C, D, E, F, delete B and try to go from A to C. Example: in e-commerce, try to GET Index (A) and then GET Shopping cart (C) without going through ADD Product to Cart (B)
2. Randomly replace a step from the chain with one form outside. For example, during a checkout scenario, add a GET to request a password.
3. Inverse the order of any two pairs. Like from scenario A,B,C,D,E,F generate: B,A,C,D,E,F ; A,C,B,D,E,F ; A,B,D,C,E,F ...
4. Duplicate a step. Like from scenario ABCDEF generate: AABCDEF, ABBCDEF ...
Why let the last step in the place? Because it is easy to keep the assert focused on that as that is probably where the user receives the value of their time investment in using the product.
--
Of course what I described above ignores the state of the system and possible side effects from one state to another, which are important and should be taken into consideration, but I think this is not what you are looking to cover with testing.
I am not sure I managed to explain well my approach to testing in this case. I really hope this year I will have more time to write about this subject and learn to express more concise and clear.