If I want to find anomalies among them, what would be the way to go? I saw that k-means isn't the best method.
I don't want to find particularly examples which are just a little bit different from others, but examples which are VERY different. If you ever did web development, you might as well in your life have got a strange error inside a JSON instead of what you expected. I want to be able to get it with an algorithm.
Why I want to do that? I have a few APIs I use, but sometimes they end up changing those responses or give out unknown response body's. I want my algorithm/model to be able to detect them and show me a list of the biggest anomalies.
If I manage to do it successfully, I'll make sure it's open source. Also if you know an easy way or an OSS solution, please also share). Hell, even if you know what I should study! I was studying deep learning but didn't find any known methods by me that I could use in order to make sense of that data.
You should probably start off trying the simplest thing that could possibly work for your use-case.
What is an anomalous JSON file other than a JSON file that does not meet the specification[0]?
I have never gotten a "strange error" from a JSON parser. Most JSON parsers are very specific about whatever character they dislike. I would suggest that the algorithm you're seeking is in fact whatever is giving you the error.
If you're speaking to an API returning JSON, then you should be able to determine what the API is supposed to return to you. Many times different responses contain meaning about why the response is different than expected, like HTTP status codes.
Deep learning is a tool to use to solve a problem. Until you have a well defined problem it will be difficult to apply various machine learning techniques to it.
The docs are missing some of this. If you jump into slack, we'd be happy to help.