On the topic of classification, large language models such as BERT are the state-of-the-art. But, there was a paper recently at the ACL [1] which showed that a simple approach of using gzip compression + k-nearest neighbour achieved similar level of performance. Later on, their KNN evaluation methodology came under question because instead of doing KNN with k=2 they reported accuracy based on top-2. Even then, what it shows is that we don't need LLMs for simple tasks such as classification, where traditional techniques still work well and are much cheaper to run. I'm sure some people might say that we should just discard that paper because it had that error. But Sebastian Raschka ran it on the IMDB movie reviews dataset and reported 71% accuracy [3]. I wrote an explainer article on it [2].
[1] https://aclanthology.org/2023.findings-acl.426.pdf [PDF]
[2] https://codeconfessions.substack.com/p/decoding-the-acl-pape...
[3] https://magazine.sebastianraschka.com/p/large-language-model...