Disclaimer, I'm from the Marqo team.
ClickHouse already works good for vector search.
For example, if you have one million of vectors of 1024 dimensions, and you search nearest vectors to a vector by brute force search, the query will take 150 ms, which is good for a reccomendation system scenario for e-commerce, food-tech, and similar applications.
Example:
CREATE TABLE vectors (id UInt64, vector Array(Float32)) ENGINE = Memory;
SET max_block_size = 16; -- 64 KB per row
INSERT INTO vectors SELECT number, arrayMap(x -> randNormal(0.0, 1.0, x), range(1024)) FROM numbers_mt(1000000); -- 4 GiB
WITH (SELECT vector FROM vectors LIMIT 1) AS target
SELECT count() FROM vectors WHERE NOT ignore(L2SquaredDistance(vector, target)); -- 0.113
SELECT count() FROM vectors WHERE NOT ignore(L2Norm(vector)); -- 0.110
WITH (SELECT vector FROM vectors LIMIT 1) AS target
SELECT count() FROM vectors WHERE NOT ignore(arraySum((x, y) -> x * y, vector, target)); -- 0.150
WITH (SELECT vector FROM vectors LIMIT 1) AS target
SELECT id, L2SquaredDistance(vector, target) AS distance FROM vectors ORDER BY distance LIMIT 10; -- 0.144
It is open source, super fast and really easy to work with. Plus, it can easily handle huge volumes, we even have it running with a billion objects.
Disclaimer: I work for MyScale.
https://clickhouse.com/docs/en/sql-reference/functions/dista...