HACKER Q&A
📣 rectang

More Ergonomic Assembly Language?


It has been many decades since the AT&T and Intel syntaxes were introduced, and we've learned a lot about programming language design since then.

For example, consider how AT&T assembler uses parentheses to dereference memory locations: `(%eax)`. Such a usage of parentheses is unlike that of most programming languages, and unlike how parentheses are used in mathematical notation. You could say it "violates the principle of least surprise".

Another problem is width specification: typing e.g. `DWORD` (in all caps) a la Intel to specify width is verbose and laborious, but appending a letter to the instruction name a la AT&T makes those names harder for the human eye to parse.

Then there's the inscrutable naming of registers and instructions. Ideally we'd want them to follow "Huffman Coding" naming principles, where high-value short, clear names are assigned to the most commonly used elements and rarer elements get somewhat longer, more explicit names. Unfortunately this is a problem for hardware manufacturers, but let's dream for a moment that they're listening.

What would a more ergonomic assembly language look like?


  👤 nsajko Accepted Answer ✓
Firstly, I think those may interest you slightly, those are some projects which tried to have a somewhat unified assembly syntax across ISAs, and with more regular naming of registers, etc.: https://9p.io/sys/doc/asm.html https://tip.golang.org/doc/asm

Now, are you proposing to change the "ordinary" assembly (with simple one-to-one mappings between assembly and machine instructions) or rather something like this: https://en.wikipedia.org/wiki/High-level_assembler ? I think it is important in this kind of discussion to elaborate on the uses we have for assembly languages, the main uses I know about are:

1) Documenting an ISA or compiler or similar.

2) Reading machine code as disassembly. (For debugging, profiling, etc.)

3) From within a high-level language to rewrite some hot-spots in the source code so they would be more efficient.

4) Using specific instruction not accessible by compiler intrinsics. This is mainly for kernel or embedded work.

I think that for all four uses one would want an "ordinary" low-level assembly (including the Plan 9 or Go assembly languages), but honestly, I never used a high-level assembly and have no idea why would somebody use something like that except perhaps for "demos", so that one could brag of having written the whole thing in "assembly". In any case, you should clarify which use of assembly language you have in mind for this discussion.

> It has been many decades since the AT&T and Intel syntaxes were introduced, and we've learned a lot about programming language design since then.

I don't think programming language design applies very much to assembly language, at least the usual, low-level kind. I'd say that assembly language is only a trivial case of a programming language, because it definitely is a programming language, but the assembler does not have to bother with types or translating all those high-level constructs into lower-level forms. Assembly is much more "descriptive" of the end-result (machine code) than high-level programming languages are. Another point is that assembly is, as a formal language, much simpler than high-level programming languages, so it is already easy to understand and there is necessarily little room for improvement.

> Such a usage of parentheses is unlike that of most programming languages, and unlike how parentheses are used in mathematical notation. You could say it "violates the principle of least surprise".

The "principle of least surprise" sucks. It stiffles innovation for little gain. But my last point is also applicable here. I.e. I think it does not matter that the principle of least surprise is violated because assembly will be simple anyway.

> Another problem is width specification: typing e.g. `DWORD` (in all caps) a la Intel to specify width is verbose and laborious, but appending a letter to the instruction name a la AT&T makes those names harder for the human eye to parse.

I'm fine with the AT&T syntax, but maybe it would be better to have number instead of letter suffixes for specifying width, e.g.: MOV1, MOV2, MOV4, MOV8 instead of MOVB, MOVW, MOVL, MOVQ.

> Then there's the inscrutable naming of registers and instructions. Ideally we'd want them to follow "Huffman Coding" naming principles, where high-value short, clear names are assigned to the most commonly used elements and rarer elements get somewhat longer, more explicit names.

Regarding registers: we don't need that. Registers come from small finite sets. It is better to have regular and small names for registers, at least with the current ISAs.

Regarding instructions: that's kind of what we already have.