Is it reasonable to attempt to generate embeddings for every log event that hits the system? At 1TB/day, it’s like 1B log events per day, over 10k per second.
Would I just have to sample some tiny percentage of log events to generate embeddings for?
The volume feels too high, but I’m curious if others do this successfully. I want this to be reasonably cheap, like less than 1 cent per million log events.
Twitter seems to be doing something like this for all tweets at much higher volume. But I don’t want to spend too much money :)
https://learn.netdata.cloud/docs/ml-and-troubleshooting/mach...