Notably, on AlpacaEval 2.0, using solely open-source models, we achieved a margin of 7.6% absolute improvement from 57.5% (GPT-4 Omni) to 65.1% (Mix Model) and 65.7% using close source model.
Is this something that people find to be useful? Not sure what are the use case, or is working with one LLM enough
Are you one of the original contributors of the work done at together.ai [1], or are you trying to take credit for it?