How to cheaply use a vector DB to detect anomalies in logs at 1TB / day

Question

I&rsquo;m interested in playing with vector databases to detect interesting anomalies in a large volume of logs, like 1TB / day.Is it reasonable to attempt to generate embeddings for every log event that hits the system? At 1TB/day, it&rsquo;s like 1B log events per day, over 10k per second.Would I just have to sample some tiny percentage of log events to generate embeddings for?The volume feels too high, but I&rsquo;m curious if others do this successfully. I want this to be reasonably cheap, like less than 1 cent per million log events.Twitter seems to be doing something like this for all tweets at much higher volume. But I don&rsquo;t want to spend too much money :)

SushiHippie · Accepted Answer

Maybe have a look at what netdata does, maybe not 1 to 1 applicable to your use case, but I've used netdata for monitoring my own servers which ingests thousands of datapoints per second and the anomaly detection seems to work.https://learn.netdata.cloud/docs/ml-and-troubleshooting/mach...

gwnywg · Answer

Out of curiosity, are all logs coming through single pipe or is this aggregate of multiple sources and you could apply something before aggregation?