What's the best local sentence transformer?

Question

Basically what's in the title. There's been such a crazy amount of development in local LLMs if you look at LLaMa, Mistral, etc.It feels like using OpenAI's Ada to get text embeddings is probably not at all the best option at this point. What would be the best / most cost efficient way of getting text embeddings these days? Preferably open source.

caprock · Accepted Answer

The answer is dependent on the task(s) to which the embeddings will be applied. For general search in industry, the e5 models are well regarded.
A good place to start is this eval system:
https://huggingface.co/spaces/mteb/leaderboard

james-revisoai · Answer

Like caprock says, e5 are the best tradeoff for model size/speed of embedding for the results you get, they will be great at semantic search etc in English.
Possibly consider cross-encoding for semantic search depending on your use case, but whenever cross-encoding is useful, generative embeddings like Ada are usually much better... There used to be embeddings useful for things like classifying sentences are entailing one another, whether a sentence was complete, but these are basically completely supplanted these days.
Do consider the all-mini type embeddings (default in sentence transformer) for speed or on-device use. They are half the size (and therefore less than half the computing for distance functions) so they are faster for large searches etc, which is useful if you run your own stuff with vector stores rather than a service.

What's the best local sentence transformer?

The answer is dependent on the task(s) to which the embeddings will be applied. For general search in industry, the e5 models are well regarded.A good place to start is this eval system:https://huggingface.co/spaces/mteb/leaderboard

The answer is dependent on the task(s) to which the embeddings will be applied. For general search in industry, the e5 models are well regarded.
A good place to start is this eval system:
https://huggingface.co/spaces/mteb/leaderboard