I didn't do much ARM before working on the game, but since playing a lot, I'm very quick at reading disassembly, even for instructions not present in the game. It might help you to do the same – the timed game aspect forces you to learn to read the instructions quickly.
The game is like Tetris, but the blocks are ARM assembly instructions. As instructions fall, you can change the operand registers. Locking instructions into the .text section executes them in a CPU emulator running client-side in the browser, so you can immediately see the effects of every action. Your score is stored in memory at the address pointed to by one of the registers, so even though you earn points for each instruction executed without segfaulting, the true goal is to execute instructions that directly change the memory containing the score value.
When I released it a bit less than a year ago, I posted it to Hacker News as a Show HN:
private uint asmBitswap32(uint x) @trusted pure
{
asm pure nothrow @nogc { naked; }
version (D_InlineAsm_X86_64)
{
version (Win64)
asm pure nothrow @nogc { mov EAX, ECX; }
else
asm pure nothrow @nogc { mov EAX, EDI; }
}
asm pure nothrow @nogc
{
// Author: Tiago Gasiba.
mov EDX, EAX;
shr EAX, 1;
and EDX, 0x5555_5555;
and EAX, 0x5555_5555;
shl EDX, 1;
or EAX, EDX;
mov EDX, EAX;
shr EAX, 2;
and EDX, 0x3333_3333;
and EAX, 0x3333_3333;
shl EDX, 2;
or EAX, EDX;
mov EDX, EAX;
shr EAX, 4;
and EDX, 0x0f0f_0f0f;
and EAX, 0x0f0f_0f0f;
shl EDX, 4;
or EAX, EDX;
bswap EAX;
ret;
}
}
The compiler will handle all the program setup and teardown, and you can just concentrate on the assembler part. You can also compile programs with the -vasm switch and the compiler will emit the asm corresponding to the code: int square(int x) { return x * x; }
compiling: dmd -c test.d -vasm
prints: _D4test6squareFiZi:
0000: 0F AF C0 imul EAX,EAX
0003: C3 ret
By trying simple expressions like `x * x` and looking at what the compiler generates, and looking at the instructions in the referenced link, you'll get the hang of it pretty quick.
"Getting Started with LLVM Core Libraries: Get to Grips With Llvm Essentials and Use the Core Libraries to Build Advanced Tools "
"The Architecture of Open Source Applications (Volume 1) : LLVM" https://aosabook.org/en/v1/llvm.html
"Tourist Guide to LLVM source code" : https://blog.regehr.org/archives/1453
llvm home page : https://llvm.org/
llvm tutorial : https://llvm.org/docs/tutorial/
llvm reference : https://llvm.org/docs/LangRef.html
learn by examples : C source code to 'llvm' bitcode : https://stackoverflow.com/questions/9148890/how-to-make-clan...
If you are interested in ARM or RISC-V assembly, the concepts are pretty similar but the instructions are different. For any architecture, you're going to have to read the architecture manuals to get a good working knowledge of the instructions and how to use them. An easy way to get started is to write a program in C, then replace the functions with assembly code one by one until your C code is just main() and a header.
ARMv7: https://developer.arm.com/documentation/100076/0200/a32-t32-...
ARMv8: https://developer.arm.com/documentation/ddi0602/2024-03/Base...
alternative: https://www.scs.stanford.edu/~zyedidia/arm64/
RISC-V: https://riscv.org/technical/specifications/
x86: https://www.intel.com/content/www/us/en/developer/articles/t...
web format: http://x86.dapsen.com/
If you like to learn by example (most of these are not great, but good enough to get started):
Step #1: read the arch manual for some CPU. Read most if not all of it. It’s a lot of reading but it’s worth it. My first was PowerPC and my second was x86. By the time I got to arm, I only needed to use the manual as a reference. These days I would start with x86 because the manuals are well written and easily available. And the HW is easily available.
Step #2: compile small programs for that arch using GCC, clang, whatever and then dump disassembly and try to understand the correspondence between your code and the instructions.
[1]: https://developer.apple.com/documentation/apple-silicon/cpu-...
[2]: https://www.intel.com/content/www/us/en/developer/articles/t...
As a good refresher on assembly and compilers
0. https://cs.lmu.edu/~ray/notes/nasmtutorial/
1. https://www.intel.com/content/www/us/en/developer/articles/t...
The one resource they don't list is the ISA manual which is called the Principles of Operation which the latest version can be found here: https://publibfp.dhe.ibm.com/epubs/pdf/a227832d.pdf
It is actually pretty amazing at how easy it is to learn other architectures once you understand how one or two work.
A typical function has two kinds of assembly code:
(1) The ABI-required logic for functions and function calls, and
(2) Everything else, which can be more or less whatever you want. As long as you don't stomp on the details required by the ABI.
These resources are mostly aimed at solving problems for which compilers are not very useful, so there are probably other resources that are a better fit.
[0]: https://github.com/Highload-fun/platform/wiki
[1]: https://www.intel.com/content/www/us/en/content-details/6714...
know its not about llvm and jit etc. - but imho the basics is first this, and then moving up. otherwise it's confusing.
I would also recommend NASM's guide for syntax and such. https://www.nasm.us/xdoc/2.13.03rc1/html/nasmdoc0.html
For long out of date assembler this YouTube channel: https://www.youtube.com/@ChibiAkumas
For modern assembler this YouTube channel: https://www.youtube.com/@WhatsACreel
From my experience, Intel's x86 manual is better and easier to read than AMD's. It's a free download.
The other is the assembler - what syntax it gives you, how it handles macros, whether it optimises, whether it does any semantic analysis. GNU AS is different to NASM is different to flat assembler.
I didn't get much out of reading compiler disassembly relative to handwritten assembly. I'd recommend trying to find some of the latter, might need to be maths libs or video codecs or similar. I'd be interested in recommendations here, the asm I learned from was proprietary.
If you want the full monty, I think you'll have to read the LLVM documentation on JIT linking: https://llvm.org/docs/JITLink.html
I haven't found any academic papers or tutorials on JIT linking, unfortunately.
https://www.intel.com/content/www/us/en/developer/articles/t...
You will want the first two volumes. For LLVM and JIT work you don't need the last two volumes.
Not kind, or gentle, but certainly definitive and authoritative.
BTW, if anybody has recommendations for assembly in the context of crash dumps, I’d be very appreciative.
It very thoroughly describes Cortex M0 assembly language and it also touches the concept of multiprocessor programming. And you just need two Raspi Picos (one to serve as programmer) which are very available.
This is different. I would suggest "Intel® 64 and IA-32 Architectures Optimization Reference Manual", as well as https://www.agner.org/optimize/ .
One of the issues with modern processors (wasn't true back in the day with the old 8-bitters) is that the processor is so much faster than memory access that this needs to be taken into consideration when writing optimized code. Instruction timings (number of clock cycles) for memory access are going to vary a lot depending on where the data is being held - in cache or in main memory. Writing optimized code (high level as well as assembler) therefore becomes not just a matter of making the code itself as minimal and fast as possible, but also organizing the program's data access to operate out of cache as much as possible and minimize main memory access. The key is to be sensitive to the layout of your data in memory, and try to have your inner loops/code access nearby (same cache line) data rather than hopping about all over the place. e.g. If you have a 2-D array that's laid out in memory row by row (vs col by col), then you want to access it that way too (work on rows) to take advantage of cache.
I used to write a lot of 8-bit assembler back in the day (as well as more recently for some retro-computing fun), but never x86, so don't have any specific resources to share. Once you've learned the basics of the instruction set, a good point to start might be to take some simple functions and compile to assembler both with and without optimization enabled - and try coding the same function yourself in assembler to see if you can beat the compiler. Search for "x86 tricks" type of resources too - the things that other assembly programmers have learnt how to optimize use of the instruction set and write fast and compact code.
Note that cache considerations apply to code as well as data, so you want your code to be compact (fit as much of your inner loops into cache as possible), and also to branch as little as possible, for two reasons. First, you want to take advantage of cache by executing consecutive instructions as far as possible, and second branching kills the pipelining performance (you are throwing away work already done) of modern processors, even though they try to mitigate this with branch prediction.
It could be one option.
Many people that work in DS/ML can barely make it with Python.