Best Embedding Models?
Hey HN, which embedding models are people using? There has been so much development around foundational LLMs, but haven't seen much news about embedding models.
I've liked qwen and embeddinggemma for local search. Qwen because 32K is enough to basically fit a whole page into the context window and embeddiggemma because it's crazy efficient.
embeddings are easy to fine tune. Try modern bert.
I’m partial to jina.ai — they have open models for code and prose, all easily runnable locally.
Feels like embeddings are underrated compared to LLM's hype, but they doing great.
who knows a tool for rug check in crypto
I’ve been using MixedBread, which is a pretty old model at this point. Recently, I tried comparing it to some newer models and was disappointed that the results weren’t dramatically and uniformly better.
You probably can’t go wrong if you pick a recent one that scores decently well on benchmarks and is at the right price point (or memory requirement) for whatever you’re trying to do.