Why are 1.00-bit LLMs not used?

Question

I've seen papers on 1.58-bit LLMs with "minimal" weights -1, 0, and 1, and they show good accuracy with much smaller sizes.But I haven't been able to find any LLMs with strict 1.00-bit weights (just values 0 and 1). I guess one would represent negative values by having separate "positive" and "negative" matrices, then just adding/subtracting the results of multiplication. To me it looks like a very efficient solution:1) huge memory and energy savings,2) computationally dot product is just a POPCNT(A & B), and matrices could be laid out very efficiently in memory (a 64-byte cache line holds 512 weights!), so matrix multiplication should be very fast, too,3) should run very fast on a CPU,4) there should be no precision reduction against 1.58-bit LLM.What are the downsides of this approach? Where could I read about it?

ClassyJacket · Accepted Answer

I don't know what I'm talking about but don't you need more than one bit to have non-linearity?