HACKER Q&A
📣 nlitened

Why are 1.00-bit LLMs not used?


I've seen papers on 1.58-bit LLMs with "minimal" weights -1, 0, and 1, and they show good accuracy with much smaller sizes.

But I haven't been able to find any LLMs with strict 1.00-bit weights (just values 0 and 1). I guess one would represent negative values by having separate "positive" and "negative" matrices, then just adding/subtracting the results of multiplication. To me it looks like a very efficient solution:

1) huge memory and energy savings,

2) computationally dot product is just a POPCNT(A & B), and matrices could be laid out very efficiently in memory (a 64-byte cache line holds 512 weights!), so matrix multiplication should be very fast, too,

3) should run very fast on a CPU,

4) there should be no precision reduction against 1.58-bit LLM.

What are the downsides of this approach? Where could I read about it?


  👤 ClassyJacket Accepted Answer ✓
I don't know what I'm talking about but don't you need more than one bit to have non-linearity?